
January 23rd, 2013, 12:25 AM
|
 |
Lost in code
|
|
|
|
|
First make sure you're running a 64 bit version of PHP so that you can handle the numbers as integers rather than strings. Otherwise this will be very slow no matter what you do.
Assuming that your blacklist doesn't change or only changes rarely, sort it in numerical order in advance (before processing your files). You can perform a binary search on the blacklist then, which will only require about 32 comparisons per line in the input files. Make sure that you have enough RAM to store the whole black list in memory without swapping. If you don't, then again, this will be slow no matter what you do.
Also make sure that you have enough RAM to store the whole input file in memory twice without swapping.
Loop through the input file line by line and perform the binary lookup on the blacklist to determine whether the integer exists in it. If the integer is not in the blacklist, append the integer to a separate buffer that holds non-blacklisted items. At the end of the whole loop, write the separate buffer to your destination file.
It will probably still take a fair amount of time to run, but you can probably do it in under 10 minutes.
|