BSD Help
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me
Go Back   Dev Shed ForumsOperating SystemsBSD Help

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
Get inside! Sample the range of functionality easily built with JMSL Library for Time Series Data Analysis, Heat Maps, Portfolio Optimization, Monte Carlo Simulation, Stock Price Charting and more. Download Now!
  #1  
Old October 28th, 2003, 12:13 PM
OSX OSX is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Mar 2003
Location: On another planet
Posts: 30 OSX User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 6
Search and remove duplicate files/filenames

Hi

I've done a google and also searched the FreeBSD mailing list aswell as this site. Unfortunately I can't seem to find the answer

Basically I would like to know how one could go about searching their entire system for duplicate files and or filenames.

What a bash/perl script be best? Could anyone offer any advice on how to create such a script.

Many thanks


Reply With Quote
  #2  
Old October 28th, 2003, 11:50 PM
icrf's Avatar
icrf icrf is offline
Perl Monkey
Dev Shed Intermediate (1500 - 1999 posts)
 
Join Date: May 2003
Location: the far end of town where the Grickle-grass grows
Posts: 1,856 icrf User rank is Second Lieutenant (5000 - 10000 Reputation Level)icrf User rank is Second Lieutenant (5000 - 10000 Reputation Level)icrf User rank is Second Lieutenant (5000 - 10000 Reputation Level)icrf User rank is Second Lieutenant (5000 - 10000 Reputation Level)icrf User rank is Second Lieutenant (5000 - 10000 Reputation Level)icrf User rank is Second Lieutenant (5000 - 10000 Reputation Level)icrf User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 4 Days 10 h 28 m 31 sec
Reputation Power: 103
Send a message via AIM to icrf
In Perl, finding duplicate filenames would be a pretty trivial task with a hash (assuming you have the memory to store all the filenames, likely so) and the File::Find module (check search.cpan.org). I'm not sure about finding actual duplicate files. There are modules available that will do checksum-like hashing on a file creating a "unique" string (MD5 sums are common on the net for things like cd images). Memory usage and compute time go way up that way, but you have a better shot at finding files that are truely the same.

What's the end goal? Curiosity, or something more substantial than that? There's often/always a better way.
__________________
Andrew - Perl (and VB.NET) Monkey

Never underestimate the bandwidth of a hatchback full of tapes.

Reply With Quote
  #3  
Old October 29th, 2003, 09:16 AM
Kung Foo Master Kung Foo Master is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Sep 2003
Posts: 68 Kung Foo Master User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 5
>Basically I would like to know how one could go about searching their entire system for duplicate files and or filenames.
A lot of files that have the same name are legit. For example, every user on a system that uses bash will have a .profile file, do you want your script to find those? You'll need to redefine what you want, or you'll most surely not get the answers you want.

Reply With Quote
  #4  
Old October 29th, 2003, 04:35 PM
OSX OSX is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Mar 2003
Location: On another planet
Posts: 30 OSX User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 6
Thanks for the replies both

Quote:
What's the end goal? Curiosity, or something more substantial than that? There's often/always a better way.


Basically I don't have a massive HDD - 4.3GB. I'm just trying to clean up my system and remove old stale and duplicate files!

I'm new to FreeBSD (love it by the way ) and have experimented and tried various things along the way - some have been totally wrong

I've recently been playing with upgrading things links openssl and openssh (which are obviously part of the base system). I didn't know about things like:

make -DOPENSSL_OVERWRITE_BASE install clean

until recently. Through my own clumsiness I moved all of the man files for the openssl port (didn't overwrite base install) into /usr/local/man. Anyway I ended up having a lot of duplicate man pages.

A friend of mine is a lot better than me at bash scripting (I score a 0 at bash scripting tbh) and came up with this script:

find /usr | sed s#.*/##g | sort | uniq --invert > duplicate_files

which lists all duplicate filenames within the /usr directory (ps - there are a lot!)

A start towards my problem would probably be to list all duplicate man pages.

I'm sure modification of the above bash script could be the solution?

Many thanks for your help


Reply With Quote
  #5  
Old October 29th, 2003, 04:37 PM
OSX OSX is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Mar 2003
Location: On another planet
Posts: 30 OSX User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 6
I suppose running

grep -i .gz ./duplicate_files

would show all of the duplicate man page names ?

Reply With Quote
  #6  
Old December 9th, 2003, 01:26 PM
IanBal IanBal is offline
Junior Member
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2003
Location: Austria
Posts: 1 IanBal User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
I can suggest my own project which I wrote for Linux and should work on all BSD's with a Java Runtime Engine installed. It's called Duplicate File Finder and is at URL

I'm doing a rewrite in C++ because a lot of people have expressed disatisfaction with haivng it in Java, but that rewrite won't be available before March 2004. I will of course do testing of the rewritten code on Linux (SuSE, Red Hat), Solaris 9 on Sparc, and a BSD (Free or Open).

There's a feedback form on DFF's page. Use it to tell me what you think of DFF.

Reply With Quote
Reply

Viewing: Dev Shed ForumsOperating SystemsBSD Help > Search and remove duplicate files/filenames


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump


Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 





© 2003-2008 by Developer Shed. All rights reserved. DS Cluster 1 hosted by Hostway