|
|
|
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
|
|
Get inside! Sample the range of functionality easily built with JMSL Library for Time Series Data Analysis, Heat Maps, Portfolio Optimization, Monte Carlo Simulation, Stock Price Charting and more. Download Now! |
|
#1
|
|||
|
|||
|
Bash script to find and get rid of duplicate files in a folder
I run a mailserver which saves copies of both incoming and outgoing emails. Unfortunately the way it is set up, if someone sends an outgoing email to more then one recipient, it saves more then one copy of the email. This gets a little silly if they are sending large emails to lots of different people.
So i would like to have a bash script which will check for duplicate files in a folder and delete duplicates only leaving one copy of each. I know that thd cmp command will check one file against another, but i don't know how to get it to delete one of them, if there are no differences to the files. Also, this will only be useful if it can go through a whole folder checking for duplicates in it. It isn't helpful if i have to tell it which files to compare. I'm sorry i am new to bash scripts (i run them via cygwin on windows 2003 as they seem to have much more functionality then the windows command prompt does) and so i am unable to do this myself. Any help would be much appreciated. Thankyou. Ben |
|
#2
|
|||
|
|||
|
not the better way to do that, re- configure
the mail server, but why not just for fun: not tested, very slow on large dirs, perl is faster delete 'echo' before 'rm' to activate Code:
#!/usr/bin/sh
ALL=`find . -type f`
for ONE in $ALL
do [ -f $ONE ] || continue #still deleted
for TWO in $ALL
do [ -f $TWO ] || continue # NOT sure it's needed
case $ONE in $TWO) continue;; esac # really the same file
cmp -s $ONE $TWO && echo rm -f $TWO
done
done
|
|
#3
|
|||
|
|||
|
Thanks very much for the script.
Unfortunately it seems not only to delete duplicates, but all copies of them, and infact all files in the dir! |
|
#4
|
|||
|
|||
|
don't believe, maybe typo?
|
|
#5
|
|||
|
|||
|
tested it again, and this time it didn't delete all files, but it did delete all copies of the duplicated file.
I have a folder with: 1.txt 2.txt 3.txt 4.txt 4a.txt 4b.txt 1,2,3 are unique 4,4a,4b are identical. With echo on, this is the output: $ bash "f:/cygwin/bin/duplicates.sh" rm -f ./4a.txt rm -f ./4b.txt rm -f ./4.txt rm -f ./4b.txt rm -f ./4.txt rm -f ./4a.txt This appears to want to remove all three identical files. Thanks Ben |
|
#6
|
|||
|
|||
|
In fact, just after posting i tried it with some of the saved emails and it is trying to delete unique files as well.
My emails are saved int he format [date][time][from][to].txt The two emails here are unique and are marked for deletion by the script. files 1,2 and 3 are not marked for deletion. $ bash "f:/cygwin/bin/duplicates.sh" rm -f ./4a.txt rm -f ./4b.txt rm -f ./4.txt rm -f ./4b.txt rm -f ./4.txt rm -f ./4a.txt rm -f ./[2004-09-20][165111][user@hermes.cam.ac.uk][mydad@ourdomain.com].txt rm -f ./[2004-09-20][165426][user@gmail.com][me@ourdomain.com].txt |
|
#7
|
|||
|
|||
|
don't you know what [] is for in shells ?
infact, when you love troubles, generate filenames containing special && metacharacters, like aaa;bbb aaa[bbb] aaa&bbb aaa`bbb and so on including tabs, vertical tabs, backspaces , spaces, newlines and all the rest (ctrl, esc) ![]() 99% of all shells, perls, awk ... will give up (this is still vivable) or make chaos. personnaly i know 2 languages handling this correctly: 'c++' and his (good, old) predecessor 'c'. Last edited by guggach : September 20th, 2004 at 11:38 AM. Reason: typo |
|
#8
|
|||
|
|||
|
I'm afraid i am completely clueless.
Are these file names going to cause a problem? Is there any way to avoid this? I tried it without the echo, and you are correct that it did leave the one copy 4.txt, the echo part was confusing me as it seemed to want to delete this as well. Thanks Ben |
|
#9
|
|||
|
|||
|
ben, may i kindly suggest you: read a book ?
try following: mkdir /tmp/aaa cd /tmp/aaa touch 1 2 3 4 5 6 7 8 9 0 # this are filenames then enter: ls [0-9] look at the output and suppose it was 'rm' not 'ls' a last point: think unix, not the OS you are accidentally running bash is good, not standard, nor native. write code running on all *nix, for scripts write the easy to write/read/mantain old bourn-sh. if it's a performance issue try 'perl' or better 'c' && 'c++' ![]() |
|
#10
|
|||
|
|||
|
Thankyou very much for your help and advice.
I don't really have any programming experience, but have been trying to write scripts which will make it simpler and less time consuming to manage our server. I tried writing batch files for windows command prompt, but found that it doesn't support the commands which i was trying to use. So i was adviced to use bash scripts via cygwin, which was what i have been trying to do. I have looked up lots of websites which tell you about the various commands you can use when writing shell scripts, but hadn't cottoned onto the problems with [ and ] in file names. Thank you very much for your help, and i shall see if someone in the perl forum can point me in the right direction. Thanks again. Ben |
![]() |
| Viewing: Dev Shed Forums > Operating Systems > UNIX Help > Bash script to find and get rid of duplicate files in a folder |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|
|
|