PHP Development
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me

The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.

Go Back   Dev Shed ForumsProgramming LanguagesPHP Development

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old January 22nd, 2013, 05:39 PM
pooly pooly is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2013
Posts: 3 pooly User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 22 m 59 sec
Reputation Power: 0
Blacklist and files

Hi Dears.

I have a Project and in a part of this Project i have a Problem..

we have a blacklist so that has 4,000,000 line and in each line has 10 character of INTEGER. Like Below
PHP Code:
 9195587756
9153002255
9121201544
9185444455
... 


And we have many file in every minute like below , So we must Compare These Files with Blacklist File And Remove Lines so there are in Blacklist. each of These Files Maybe have 300,000 line.
PHP Code:
 9195778998
9105544488
9153002255
9121201544
9185577998
... 


so after remove blacklist lines from this file we must have below file

PHP Code:
 9195778998
9105544488
9185577998
... 


i check many solution for solving this problem. like using findstr in windows and ...

but this solution is very slow and elapse long time ( 10 minutes for 1 file )

Please Help me to solving This problem. ( fastest way to doing these works. )

Sorry for poor english.

Tnx

Reply With Quote
  #2  
Old January 23rd, 2013, 12:25 AM
E-Oreo's Avatar
E-Oreo E-Oreo is offline
Lost in code
Click here for more information.
 
Join Date: Dec 2004
Posts: 7,931 E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)  Folding Points: 945 Folding Title: Novice Folder
Time spent in forums: 2 Months 7 h 48 m 54 sec
Reputation Power: 7053
First make sure you're running a 64 bit version of PHP so that you can handle the numbers as integers rather than strings. Otherwise this will be very slow no matter what you do.

Assuming that your blacklist doesn't change or only changes rarely, sort it in numerical order in advance (before processing your files). You can perform a binary search on the blacklist then, which will only require about 32 comparisons per line in the input files. Make sure that you have enough RAM to store the whole black list in memory without swapping. If you don't, then again, this will be slow no matter what you do.

Also make sure that you have enough RAM to store the whole input file in memory twice without swapping.

Loop through the input file line by line and perform the binary lookup on the blacklist to determine whether the integer exists in it. If the integer is not in the blacklist, append the integer to a separate buffer that holds non-blacklisted items. At the end of the whole loop, write the separate buffer to your destination file.

It will probably still take a fair amount of time to run, but you can probably do it in under 10 minutes.
__________________
PHP FAQ
How to program a basic, secure login system using PHP

Quote:
Originally Posted by Spad
Ah USB, the only rectangular connector where you have to make 3 attempts before you get it the right way around

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming LanguagesPHP Development > Blacklist and files

Developer Shed Advertisers and Affiliates



Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 


Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.

© 2003-2013 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap