#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2003
    Location
    On another planet
    Posts
    30
    Rep Power
    12

    Search and remove duplicate files/filenames


    Hi

    I've done a google and also searched the FreeBSD mailing list aswell as this site. Unfortunately I can't seem to find the answer

    Basically I would like to know how one could go about searching their entire system for duplicate files and or filenames.

    What a bash/perl script be best? Could anyone offer any advice on how to create such a script.

    Many thanks

  2. #2
  3. Perl Monkey
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    May 2003
    Location
    the far end of town where the Grickle-grass grows
    Posts
    1,860
    Rep Power
    108
    In Perl, finding duplicate filenames would be a pretty trivial task with a hash (assuming you have the memory to store all the filenames, likely so) and the File::Find module (check search.cpan.org). I'm not sure about finding actual duplicate files. There are modules available that will do checksum-like hashing on a file creating a "unique" string (MD5 sums are common on the net for things like cd images). Memory usage and compute time go way up that way, but you have a better shot at finding files that are truely the same.

    What's the end goal? Curiosity, or something more substantial than that? There's often/always a better way.
    Andrew - Perl (and VB.NET) Monkey

    Never underestimate the bandwidth of a hatchback full of tapes.
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2003
    Posts
    68
    Rep Power
    11
    >Basically I would like to know how one could go about searching their entire system for duplicate files and or filenames.
    A lot of files that have the same name are legit. For example, every user on a system that uses bash will have a .profile file, do you want your script to find those? You'll need to redefine what you want, or you'll most surely not get the answers you want.
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2003
    Location
    On another planet
    Posts
    30
    Rep Power
    12
    Thanks for the replies both

    What's the end goal? Curiosity, or something more substantial than that? There's often/always a better way.
    Basically I don't have a massive HDD - 4.3GB. I'm just trying to clean up my system and remove old stale and duplicate files!

    I'm new to FreeBSD (love it by the way ) and have experimented and tried various things along the way - some have been totally wrong

    I've recently been playing with upgrading things links openssl and openssh (which are obviously part of the base system). I didn't know about things like:

    make -DOPENSSL_OVERWRITE_BASE install clean

    until recently. Through my own clumsiness I moved all of the man files for the openssl port (didn't overwrite base install) into /usr/local/man. Anyway I ended up having a lot of duplicate man pages.

    A friend of mine is a lot better than me at bash scripting (I score a 0 at bash scripting tbh) and came up with this script:

    find /usr | sed s#.*/##g | sort | uniq --invert > duplicate_files

    which lists all duplicate filenames within the /usr directory (ps - there are a lot!)

    A start towards my problem would probably be to list all duplicate man pages.

    I'm sure modification of the above bash script could be the solution?

    Many thanks for your help

  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2003
    Location
    On another planet
    Posts
    30
    Rep Power
    12
    I suppose running

    grep -i .gz ./duplicate_files

    would show all of the duplicate man page names ?
  10. #6
  11. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2003
    Location
    Austria
    Posts
    1
    Rep Power
    0
    I can suggest my own project which I wrote for Linux and should work on all BSD's with a Java Runtime Engine installed. It's called Duplicate File Finder and is at http://midori.shacknet.nu/dff/

    I'm doing a rewrite in C++ because a lot of people have expressed disatisfaction with haivng it in Java, but that rewrite won't be available before March 2004. I will of course do testing of the rewritten code on Linux (SuSE, Red Hat), Solaris 9 on Sparc, and a BSD (Free or Open).

    There's a feedback form on DFF's page. Use it to tell me what you think of DFF.
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2009
    Posts
    1
    Rep Power
    0
    Originally Posted by OSX
    Hi

    I've done a google and also searched the FreeBSD mailing list aswell as this site. Unfortunately I can't seem to find the answer

    Basically I would like to know how one could go about searching their entire system for duplicate files and or filenames.

    What a bash/perl script be best? Could anyone offer any advice on how to create such a script.

    Many thanks

    Try this prog. Its designed to delete duplicate files
    But not sure about the exactly script though.

IMN logo majestic logo threadwatch logo seochat tools logo