#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Posts
    3
    Rep Power
    0

    Blacklist and files


    Hi Dears.

    I have a Project and in a part of this Project i have a Problem..

    we have a blacklist so that has 4,000,000 line and in each line has 10 character of INTEGER. Like Below
    PHP Code:
    9195587756
    9153002255
    9121201544
    9185444455
    ... 
    And we have many file in every minute like below , So we must Compare These Files with Blacklist File And Remove Lines so there are in Blacklist. each of These Files Maybe have 300,000 line.
    PHP Code:
    9195778998
    9105544488
    9153002255
    9121201544
    9185577998
    ... 
    so after remove blacklist lines from this file we must have below file

    PHP Code:
    9195778998
    9105544488
    9185577998
    ... 
    i check many solution for solving this problem. like using findstr in windows and ...

    but this solution is very slow and elapse long time ( 10 minutes for 1 file )

    Please Help me to solving This problem. ( fastest way to doing these works. )

    Sorry for poor english.

    Tnx
  2. #2
  3. No Profile Picture
    Lost in code
    Devshed Supreme Being (6500+ posts)

    Join Date
    Dec 2004
    Posts
    8,300
    Rep Power
    7170
    First make sure you're running a 64 bit version of PHP so that you can handle the numbers as integers rather than strings. Otherwise this will be very slow no matter what you do.

    Assuming that your blacklist doesn't change or only changes rarely, sort it in numerical order in advance (before processing your files). You can perform a binary search on the blacklist then, which will only require about 32 comparisons per line in the input files. Make sure that you have enough RAM to store the whole black list in memory without swapping. If you don't, then again, this will be slow no matter what you do.

    Also make sure that you have enough RAM to store the whole input file in memory twice without swapping.

    Loop through the input file line by line and perform the binary lookup on the blacklist to determine whether the integer exists in it. If the integer is not in the blacklist, append the integer to a separate buffer that holds non-blacklisted items. At the end of the whole loop, write the separate buffer to your destination file.

    It will probably still take a fair amount of time to run, but you can probably do it in under 10 minutes.
    PHP FAQ

    Originally Posted by Spad
    Ah USB, the only rectangular connector where you have to make 3 attempts before you get it the right way around

IMN logo majestic logo threadwatch logo seochat tools logo