|
|
|||||||||
|
|||||||||
| |||||||||
|
|
|
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
|
|
Stop making mediocre tutorials.The best tutorials are video! Camtasia Studio makes it easy to create engaging, buzz-building screen videos at any size, in any popular format. Download the free trial!
|
|
#1
|
|||
|
|||
|
Unique records in a text file
I have a 30MB text file (a printer spool file) that has a lot of duplicate information. It basically has the form:
B/M NUMBER: *PAU 00001 00010 xxxxxxxxxxxx 00020 xxxxxxxxxxxx 00030 xxxxxxxxxxxx B/M NUMBER: *PAU 00002 00010 xxxxxxxxxxxx 00020 xxxxxxxxxxxx 00030 xxxxxxxxxxxx B/M NUMBER: *PAU 00001 00010 xxxxxxxxxxxx 00020 xxxxxxxxxxxx 00030 xxxxxxxxxxxx . . . for many different B/M numbers. Someone suggested Perl could help sort this out and gave me the following three lines: $/ = "" ; while (<>) { $Bills{$_}++ }; foreach $Bill (sort keys %Bills) { print $Bill }; I haven't yet figured out how everything in the code works, but it does indeed sort the file very quickly and remove duplicates. However, I'm still not getting the unique ocurrences of the B/Ms themselves. In the case where a page break splits a B/M, there is another header inserted and I have: B/M NUMBER: *PAU 00001 00010 xxxxxxxxxxxx B/M NUMBER: *PAU 00001 00020 xxxxxxxxxxxx 00030 xxxxxxxxxxxx 00040 xxxxxxxxxxxx These B/Ms need to be concatenated somehow and then the duplicates eliminated. Any suggestions either in Perl or something else? Patrick |
|
#2
|
|||
|
|||
|
I believe this can be done with JavaScript. You have to use REGULAR EXPRESSIONS and search the file for those expressions. You can then split the string (for your file will be treated as a long string) before and after the REG EXP and concatinate the two again.
You will have to get used to the strict RE syntax. |
![]() |
| Viewing: Dev Shed Forums > Other > Beginner Programming > Unique records in a text file |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|
|
|