PHP Development
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me
Go Back   Dev Shed ForumsProgramming LanguagesPHP Development

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old February 10th, 2006, 05:30 AM
ronobi ronobi is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2005
Posts: 15 ronobi User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 12 h 22 m 38 sec
Reputation Power: 0
How to parse a large tab delimited text file

Hello,

I have a large tab delimited text file provided each day by my importer (about 8 MB). I would like to compare that file with my database by giving a column value.

I have tried fgets(), but seems it cannot handle that big file. file() also not working. Any idea?

Thanks in advance.

Reply With Quote
  #2  
Old February 10th, 2006, 05:43 AM
noelkelly noelkelly is offline
Custom User Title???
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Aug 2004
Location: Brisbane, Australia
Posts: 159 noelkelly User rank is Corporal (100 - 500 Reputation Level)noelkelly User rank is Corporal (100 - 500 Reputation Level)noelkelly User rank is Corporal (100 - 500 Reputation Level)noelkelly User rank is Corporal (100 - 500 Reputation Level) 
Time spent in forums: 1 Day 3 h 30 m 17 sec
Reputation Power: 6
Send a message via MSN to noelkelly
Where is the file, is it on the server or a client pc???

Reply With Quote
  #3  
Old February 10th, 2006, 05:50 AM
ronobi ronobi is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2005
Posts: 15 ronobi User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 12 h 22 m 38 sec
Reputation Power: 0
It resides in a local server.

So, any suggestion??

Reply With Quote
  #4  
Old February 10th, 2006, 09:37 AM
Guelphdad's Avatar
Guelphdad Guelphdad is offline
Headed to unemployment
Dev Shed God 5th Plane (7000 - 7499 posts)
 
Join Date: Nov 2001
Location: Guelph Canada
Posts: 7,201 Guelphdad User rank is Brigadier General (60000 - 70000 Reputation Level)Guelphdad User rank is Brigadier General (60000 - 70000 Reputation Level)Guelphdad User rank is Brigadier General (60000 - 70000 Reputation Level)Guelphdad User rank is Brigadier General (60000 - 70000 Reputation Level)Guelphdad User rank is Brigadier General (60000 - 70000 Reputation Level)Guelphdad User rank is Brigadier General (60000 - 70000 Reputation Level)Guelphdad User rank is Brigadier General (60000 - 70000 Reputation Level)Guelphdad User rank is Brigadier General (60000 - 70000 Reputation Level)Guelphdad User rank is Brigadier General (60000 - 70000 Reputation Level)Guelphdad User rank is Brigadier General (60000 - 70000 Reputation Level)Guelphdad User rank is Brigadier General (60000 - 70000 Reputation Level)Guelphdad User rank is Brigadier General (60000 - 70000 Reputation Level)Guelphdad User rank is Brigadier General (60000 - 70000 Reputation Level) 
Time spent in forums: 2 Months 2 Weeks 7 h 39 m 50 sec
Reputation Power: 630
if you are comparing it to your database why aren't you importing it to a table and doing the comparison directly in the database itself? or is it more than one table?

what exactly are you trying to compare?

Reply With Quote
  #5  
Old February 10th, 2006, 10:03 AM
ronobi ronobi is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2005
Posts: 15 ronobi User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 12 h 22 m 38 sec
Reputation Power: 0
Yeah, its more than 2 tables.

Reply With Quote
  #6  
Old February 10th, 2006, 10:52 AM
J_Tree's Avatar
J_Tree J_Tree is offline
Rocking my php-ness
Dev Shed Intermediate (1500 - 1999 posts)
 
Join Date: Dec 2004
Location: Boston, MA
Posts: 1,968 J_Tree User rank is First Lieutenant (10000 - 20000 Reputation Level)J_Tree User rank is First Lieutenant (10000 - 20000 Reputation Level)J_Tree User rank is First Lieutenant (10000 - 20000 Reputation Level)J_Tree User rank is First Lieutenant (10000 - 20000 Reputation Level)J_Tree User rank is First Lieutenant (10000 - 20000 Reputation Level)J_Tree User rank is First Lieutenant (10000 - 20000 Reputation Level)J_Tree User rank is First Lieutenant (10000 - 20000 Reputation Level)J_Tree User rank is First Lieutenant (10000 - 20000 Reputation Level) 
Time spent in forums: 2 Weeks 2 Days 15 h 11 m 47 sec
Reputation Power: 145
Send a message via ICQ to J_Tree Send a message via AIM to J_Tree Send a message via MSN to J_Tree Send a message via Yahoo to J_Tree
The memory limit set in your php.ini can and will effect your processing of large files. You can either change it, use ini_set to increase the memory limit for the processing script, or even better, fopen the file and read/process one line at a time.

If everything is on one line with tabs between items, use fgets with the byte limit and work some magic to reassemble any items cut in half.
__________________
My new WebComic http://www.jjsunshines.com/
The Geek Shall Inherit the Earth

It is NOT ok to IM me with questions unless I told you it was ok via PM

Reply With Quote
  #7  
Old February 10th, 2006, 12:37 PM
homerdooh homerdooh is offline
php newbie
Dev Shed Novice (500 - 999 posts)
 
Join Date: Nov 2001
Posts: 523 homerdooh User rank is Corporal (100 - 500 Reputation Level)homerdooh User rank is Corporal (100 - 500 Reputation Level)homerdooh User rank is Corporal (100 - 500 Reputation Level)homerdooh User rank is Corporal (100 - 500 Reputation Level) 
Time spent in forums: 3 Days 9 h 27 m 36 sec
Reputation Power: 12
Quote:
Originally Posted by ronobi
Hello,

I have a large tab delimited text file provided each day by my importer (about 8 MB). I would like to compare that file with my database by giving a column value.

I have tried fgets(), but seems it cannot handle that big file. file() also not working. Any idea?

Thanks in advance.


Use fopen and fread but instead of bringing the whole file in one chunk bring it in pieces. Whats the code you are using now?
__________________
"They have the internet on computers now" - Homer

Reply With Quote
  #8  
Old February 10th, 2006, 01:24 PM
Guelphdad's Avatar
Guelphdad Guelphdad is offline
Headed to unemployment
Dev Shed God 5th Plane (7000 - 7499 posts)
 
Join Date: Nov 2001
Location: Guelph Canada
Posts: 7,201 Guelphdad User rank is Brigadier General (60000 - 70000 Reputation Level)Guelphdad User rank is Brigadier General (60000 - 70000 Reputation Level)Guelphdad User rank is Brigadier General (60000 - 70000 Reputation Level)Guelphdad User rank is Brigadier General (60000 - 70000 Reputation Level)Guelphdad User rank is Brigadier General (60000 - 70000 Reputation Level)Guelphdad User rank is Brigadier General (60000 - 70000 Reputation Level)Guelphdad User rank is Brigadier General (60000 - 70000 Reputation Level)Guelphdad User rank is Brigadier General (60000 - 70000 Reputation Level)Guelphdad User rank is Brigadier General (60000 - 70000 Reputation Level)Guelphdad User rank is Brigadier General (60000 - 70000 Reputation Level)Guelphdad User rank is Brigadier General (60000 - 70000 Reputation Level)Guelphdad User rank is Brigadier General (60000 - 70000 Reputation Level)Guelphdad User rank is Brigadier General (60000 - 70000 Reputation Level) 
Time spent in forums: 2 Months 2 Weeks 7 h 39 m 50 sec
Reputation Power: 630
a better explanation of what exactly you are trying to do with this file in comparison to your database may allow us to give you a more involved and perhaps more accurate answer.

Reply With Quote
  #9  
Old February 11th, 2006, 06:02 AM
ronobi ronobi is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2005
Posts: 15 ronobi User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 12 h 22 m 38 sec
Reputation Power: 0
Thanx guys! Well, for a start, here is the code I used to compare the enries between database and the text file:
PHP Code:
 $product_info_query tep_db_query("select p.products_id, p.products_model, pd.products_name, p.products_price from " TABLE_PRODUCTS " p, " TABLE_PRODUCTS_DESCRIPTION " pd where p.products_status = 1 and p.products_id = pd.products_id order by pd.products_name DESC");

while (
$products tep_db_fetch_array($product_info_query)) {
$filename "large_tab_delimited.txt";
$fd fopen ($filename"r");
while (!
feof($fd)) { 
$contents fgets ($fd,filesize ($filename));
foreach(
$contents as $line){
  
$columns explode("\t"$line);
if(
$columns[1] == $products[products_model]) {
    
$code $columns[1];
    
$name $columns[4];
    
$price $columns[5];
    echo 
'DB product code: ' $products[products_model] . ' TXT product code: ' $code' DB product name:  ' $products[products_name] . ' TXT product name: ' $name DB price:  ' . $products[products_price] . ' TXT price' . $price . '<br>';
    break;
   }
  } 
 }
fclose($fd);


The above code is to demonstrate what I really want to do (please ignore any code error). Well, somehow I made it work with a small size file which may be little bit modified version of the above code.

Reply With Quote
  #10  
Old February 11th, 2006, 08:39 AM
homerdooh homerdooh is offline
php newbie
Dev Shed Novice (500 - 999 posts)
 
Join Date: Nov 2001
Posts: 523 homerdooh User rank is Corporal (100 - 500 Reputation Level)homerdooh User rank is Corporal (100 - 500 Reputation Level)homerdooh User rank is Corporal (100 - 500 Reputation Level)homerdooh User rank is Corporal (100 - 500 Reputation Level) 
Time spent in forums: 3 Days 9 h 27 m 36 sec
Reputation Power: 12
I dont know if this will work but try bringing in the data in smaller pieces. Something like this...
PHP Code:
 $handle fopen ("$filename""r");
$output "";
do {
    
$data fread($handle8192);
    if (
strlen($data) == 0) {
        break;
    }
    
$output .= $data;
} while(
true);
fclose ($handle); 


Give it a go, see how it works out.

Reply With Quote
  #11  
Old February 12th, 2006, 06:11 AM
ronobi ronobi is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2005
Posts: 15 ronobi User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 12 h 22 m 38 sec
Reputation Power: 0
Thanx a lot. But it did not work although it works with a smaller size file. I changed the memory limit too. But no success! Its kinda hang the system!!

Reply With Quote
  #12  
Old February 12th, 2006, 08:29 AM
Sepodati's Avatar
Sepodati Sepodati is offline
Banned
Dev Shed God 19th Plane (14000 - 14499 posts)
 
Join Date: Dec 1999
Location: Afghanistan
Posts: 14,385 Sepodati User rank is General 14th Grade (Above 100000 Reputation Level)Sepodati User rank is General 14th Grade (Above 100000 Reputation Level)Sepodati User rank is General 14th Grade (Above 100000 Reputation Level)Sepodati User rank is General 14th Grade (Above 100000 Reputation Level)Sepodati User rank is General 14th Grade (Above 100000 Reputation Level)Sepodati User rank is General 14th Grade (Above 100000 Reputation Level)Sepodati User rank is General 14th Grade (Above 100000 Reputation Level)Sepodati User rank is General 14th Grade (Above 100000 Reputation Level)Sepodati User rank is General 14th Grade (Above 100000 Reputation Level)Sepodati User rank is General 14th Grade (Above 100000 Reputation Level)Sepodati User rank is General 14th Grade (Above 100000 Reputation Level)Sepodati User rank is General 14th Grade (Above 100000 Reputation Level)Sepodati User rank is General 14th Grade (Above 100000 Reputation Level)Sepodati User rank is General 14th Grade (Above 100000 Reputation Level)Sepodati User rank is General 14th Grade (Above 100000 Reputation Level)Sepodati User rank is General 14th Grade (Above 100000 Reputation Level)  Folding Points: 78475 Folding Title: Intermediate FolderFolding Points: 78475 Folding Title: Intermediate FolderFolding Points: 78475 Folding Title: Intermediate FolderFolding Points: 78475 Folding Title: Intermediate Folder
Time spent in forums: 2 Months 4 Weeks 1 Day 2 h 48 m 40 sec
Reputation Power: 1943
Send a message via ICQ to Sepodati Send a message via Yahoo to Sepodati
Okay, say you have an 8Mb file and an 8Mb database. The database is _made_ for searching through that much data. You have to come up with some hack to do so with the file and it won't be efficient.

So, I'd do it in this manner.

1. Grab 10 lines out of the file using fgetcsv() (set delimiter to tab, \t)
2. Parse the "$columns[1]" entry out of each line
3. Create a query that'll pull the matching entries out of the database for those 10 columns. Something like SELECT * FROM table WHERE products_id IN (1,2,3,4,5,x) where 1,2,3,etc are the columns pulled from the file.
4. Retrieve the rows from the database one by one, find the matching $columns[x] vlaue and display whatever you need.
5. Retrieve the next 10 rows from the table and GOTO 2.

You'll run considerably more queries this way, but that's what databases are designed for. You can tweak the "10" lines of the file to stay within your memory limit, too. Maybe you can do 100, maybe only 5; who knows?

I imagine you'll want to use set_time_limit(0), also, so your script doesn't time out. Give it some time to do what it needs to do.

---John Holmes...

Reply With Quote
  #13  
Old February 12th, 2006, 11:49 AM
JoeReeves's Avatar
JoeReeves JoeReeves is offline
A Major Nuisance
Dev Shed Novice (500 - 999 posts)
 
Join Date: Jan 2005
Location: Arkansas
Posts: 640 JoeReeves User rank is Second Lieutenant (5000 - 10000 Reputation Level)JoeReeves User rank is Second Lieutenant (5000 - 10000 Reputation Level)JoeReeves User rank is Second Lieutenant (5000 - 10000 Reputation Level)JoeReeves User rank is Second Lieutenant (5000 - 10000 Reputation Level)JoeReeves User rank is Second Lieutenant (5000 - 10000 Reputation Level)JoeReeves User rank is Second Lieutenant (5000 - 10000 Reputation Level)JoeReeves User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 1 Week 4 Days 24 m 26 sec
Reputation Power: 67
Send a message via AIM to JoeReeves Send a message via MSN to JoeReeves Send a message via Yahoo to JoeReeves
Coping with mysql record count limits and script timeouts.

I moved this post to it's own thread. You can find it here.
__________________
A good discussion on screening form input data can be found here. Summary here.
Need a competent host? 100gb of web space? Windows AND Unix functionality? ASP.NET AND PHP on the same site? Try Netfirms. Highly recommended by ... me.
Save/upload the file BEFORE you test it.
Thinking is a luxury afforded by the patience of those you serve.
Die with honor ... or don't die at all
.

Last edited by JoeReeves : February 18th, 2006 at 04:58 PM.

Reply With Quote