July 26th, 2003, 06:39 PM
How to kill duplicates in a file?
In a text file, how do you go about killing duplicates?
July 26th, 2003, 07:06 PM
Bummer that you're doing it in DOS/Windows. In Linux, there's a command, uniq, which removes duplicate lines from a sorted file. It does not change the file, but rather outputs to a second file or to stdout.
You might want to see if you can find a DOS port for it. Or find the source code and create your own port.
July 26th, 2003, 07:55 PM
Even if I were on Linux I'd still want to know, I really have no use, I am just curious as to how to do it.
July 26th, 2003, 08:06 PM
Well, in Linux you would use the uniq command as a filter.
To do it programmatically -- off the top of my head -- you would sort the file first, then read one line at a time keeping the previous line. If adjacent lines are the same, then they are duplicates and only one copy should be output. If the lines are not the same, then output them.
Let's look at this step-wise:
1. Declare two string buffers, sNew and sOld.
2. Read the first line into sOld and output it.
3. Read the next line into sNew.
4. Compare sNew and sOld.
5. If they are the same, then do nothing (thus discarding the duplicate in sNew).
6. If they are different, then output sNew and copy sNew to sOld.
7. Repeat Steps 3 - 6 until EOF is reached.
8. Close the input and output files.
Of course, the input file must be sorted to make that work. Both DOS and Linux have the command-line filter, sort.