Python Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me
Go Back   Dev Shed ForumsProgramming LanguagesPython Programming

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old August 14th, 2003, 06:30 AM
jeych jeych is offline
Junior Member
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Aug 2003
Posts: 6 jeych User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
file processing

Hello,

I am a student in linguistics who needs to do some file and string processing. I've been using Perl for some time, but I'm looking for a more proper language.

I've been playing with Python, and it might be the pearl I'm looking for... (ok, easy joke)

I wrote a basic script in Python which reads a 2MB file and loads it in a string. I must confess that I am disappointed, because of the script being amazingly slow (about 5 minutes with an Athlon XP 1500+, against less than 1 second with Perl). Performance is not my main concern, but I want my scripts to perform decently.

Could someone please telle me wether the way I do it in Python is the best way to go ? Is it me or is Python definitively slow with large files ?
Thank you very much,

Julien.

#!/usr/bin/python

import string

my_string = ''
my_file = open('large_file', 'r')
for line in my_file.readlines():
line.strip()
my_string = my_string + line
my_file.close()

### end ###

And the equivalent file in Perl :

#!/usr/bin/perl

open(TXT, "my_file") or die "pb TXT : $!\n";
while(<TXT>) {
chomp;
$my_string .= $_;
}
close(TXT);

Reply With Quote
  #2  
Old August 14th, 2003, 10:02 AM
netytan's Avatar
netytan netytan is offline
Hello World :)
Dev Shed Frequenter (2500 - 2999 posts)
 
Join Date: Mar 2003
Location: Hull, UK
Posts: 2,536 netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 1 Week 2 Days 18 h 3 m 4 sec
Reputation Power: 63
Send a message via ICQ to netytan Send a message via AIM to netytan Send a message via MSN to netytan Send a message via Yahoo to netytan
Mmmm ok dude, you're just a little scary you managed to write a piece of Python with nearly exactly the same format as perl

Python isn't as fast at string prosession as perl - but then that's what perl was made for - so I don't expect it to be.

Anyway asuming your using Python 2.+ this should work better, I knowtices you imported the module (string) and didn't use it which is a bad practice in itself.

Code:
#!/usr/bin/env python

my_string = open('large_file', 'r').read().split('\n')


This will give you a list containt all the lines in a file. assuming now that you want it as a single string you can call join on it as below

Code:
#!/usr/bin/env python

my_string = ''.join(open('large_file', 'r').read().split('\n'))


Readlines reads in a line but keeps the '\n'. Other than looping through and removing each '\n' you can split the string from read on each new line and miss out this whole step and in a single line (Although you could easily pull this out into several lines)

Hope this helps,

Mark.

Reply With Quote
  #3  
Old August 14th, 2003, 11:18 AM
jeych jeych is offline
Junior Member
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Aug 2003
Posts: 6 jeych User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Quote:
Originally posted by netytan
[B]Mmmm ok dude, you're just a little scary you managed to write a piece of Python with nearly exactly the same format as perl

You're right. I felt I was doing it the perlish way...

Quote:

Python isn't as fast at string prosession as perl - but then that's what perl was made for - so I don't expect it to be.

OK, but as I mentionned performance is not my main concern.

snip...

Quote:

Code:
#!/usr/bin/env python

my_string = ''.join(open('large_file', 'r').read().split('\n'))


Readlines reads in a line but keeps the '\n'. Other than looping through and removing each '\n' you can split the string from read on each new line and miss out this whole step and in a single line (Although you could easily pull this out into several lines)

This is exactly what I needed , and it performs MUCH better.

Thank you very much, I think I'm going to learn Python seriously now .

Julien.

Reply With Quote
  #4  
Old August 14th, 2003, 12:19 PM
netytan's Avatar
netytan netytan is offline
Hello World :)
Dev Shed Frequenter (2500 - 2999 posts)
 
Join Date: Mar 2003
Location: Hull, UK
Posts: 2,536 netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 1 Week 2 Days 18 h 3 m 4 sec
Reputation Power: 63
Send a message via ICQ to netytan Send a message via AIM to netytan Send a message via MSN to netytan Send a message via Yahoo to netytan
Glad to help, I'm sure you'll have allot of fun learning Python! and once you've caught the Py'bug it'll be a long time untill you touch perl again

Take care,
Mark.

Reply With Quote
  #5  
Old August 14th, 2003, 12:57 PM
FiveGrainJa FiveGrainJa is offline
Junior Member
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Aug 2003
Location: Alexandria, VA
Posts: 5 FiveGrainJa User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Re: file processing

Code:
for line in my_file.readlines():
        line.strip()
       my_string = my_string + line

It's that third line above that was probably killing your performance. Strings in Python are immutable, so each time you hit that line you were creating a new string and copying the old one into it along with the new line. When your strings get really long and you do this a lot it can really crush performance as these 2MB strings get copied back and forth in memory.

(The other ramification of strings being immutable is that the 2nd line above wasn't doing what you probably intended. Calling line.strip() doesn't alter line, it returns a new string that is stripped leaving line itself unchanged.)

Your original script would probably work just fine if you save all the concatenation until the very end:
Code:
import string
lineList = []
for line in my_file.readlines():
        lineList.append(line.strip())
my_string = string.join(lineList, '')

Execution time for this is probably in the same ballpark as the solutions in the other replies, I think it's mostly a stylistic choice. Some folks like to get lots done with one line of code, some people prefer to more explicity list out each step. Go with whichever you think is the most readable.

Reply With Quote
  #6  
Old August 14th, 2003, 02:54 PM
netytan's Avatar
netytan netytan is offline
Hello World :)
Dev Shed Frequenter (2500 - 2999 posts)
 
Join Date: Mar 2003
Location: Hull, UK
Posts: 2,536 netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 1 Week 2 Days 18 h 3 m 4 sec
Reputation Power: 63
Send a message via ICQ to netytan Send a message via AIM to netytan Send a message via MSN to netytan Send a message via Yahoo to netytan
Hi five ,

I like writing pieced of code which use as little space as possible.. from my point of view, the less you have to type the faster your dev time and the smaller your app's will be.

On occasion I do use longer versions simply because they tend to be slightly easier to read than some of the one liners but even these are quite readable in Python . I like to use the built-in functions i.e 'string object'.split() instead of importing the string module and using string.split('string object').

It's just a matter off oppinion, whatever suits your needs and or tastes I guess.

Oh I just thought of another way to do with tould be to use replace on the file conent, missing out the whole loop, strip, append, join..

Code:
#!/usr/bin/env python

file = open('large_file', 'r').read()
file = file.replace('\n', '')


Have fun guys,
Mark.

Reply With Quote
  #7  
Old August 15th, 2003, 12:23 AM
jeych jeych is offline
Junior Member
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Aug 2003
Posts: 6 jeych User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Hello,

FiveGrainJa > Thank you for this very instructive explanation . I now understand why it was so slow.

netytan > as far as I can see, there's more than one way to do it . Your last example (with the replace method) is the fastest, and performs as fast as the perl equivalent .

Thank you both for your help, you convinced me away from Perl

Julien.

Reply With Quote
  #8  
Old August 15th, 2003, 06:39 AM
netytan's Avatar
netytan netytan is offline
Hello World :)
Dev Shed Frequenter (2500 - 2999 posts)
 
Join Date: Mar 2003
Location: Hull, UK
Posts: 2,536 netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 1 Week 2 Days 18 h 3 m 4 sec
Reputation Power: 63
Send a message via ICQ to netytan Send a message via AIM to netytan Send a message via MSN to netytan Send a message via Yahoo to netytan
very welcome Julien,

There's usually more than one way to do somthing with most languages (perl is famous for this), but one way is usually better than another, and if it isn't it comes down to personal choice of which one you want to use

Usually you should take into account the preformance, size, and readability of you code block.

Any other factors five? There the main onces I can see.

Mark.

Reply With Quote
  #9  
Old August 15th, 2003, 06:56 AM
telex4's Avatar
telex4 telex4 is offline
Wacky hack
Dev Shed Novice (500 - 999 posts)
 
Join Date: Apr 2001
Location: London, England
Posts: 512 telex4 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 h 25 m 29 sec
Reputation Power: 8
I have to admit I'm confused as to why you say you want to move from Perl to a more proper language. For processing strings, Perl is the proper language, it's designed exactly for that. Python just isn't as good when it comes to doing more complicated string manipulation. It's only worth moving to Python if the script does a lot more than string manipulation, and the string manipulation is fairly simple, because then you might find it easier to extend/embed your code in Python.

Reply With Quote
  #10  
Old August 15th, 2003, 07:57 AM
jeych jeych is offline
Junior Member
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Aug 2003
Posts: 6 jeych User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Hi,

Quote:
Originally posted by telex4
[B]I have to admit I'm confused as to why you say you want to move from Perl to a more proper language. For processing strings, Perl is the proper language, it's designed exactly for that.

Oops, I didn't want to open a Perl vs Python debate .

Perl IS nice at string manipulations, for sure . But the point is that I'm looking for a *clear* (and not *proper*, I was biaised by my mother tongue ) general purpose language, which Perl is certainly not, in my opinion .

There are things I'll continue doing with Perl, but I'll try do as much as possible ("use once" scripts, or personnal and non critical stuff) without .

Mark > I'll keep in mind your remarks concerning performance/readability/compactness

Cheers,

Julien.

Reply With Quote
  #11  
Old August 15th, 2003, 10:21 AM
netytan's Avatar
netytan netytan is offline
Hello World :)
Dev Shed Frequenter (2500 - 2999 posts)
 
Join Date: Mar 2003
Location: Hull, UK
Posts: 2,536 netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 1 Week 2 Days 18 h 3 m 4 sec
Reputation Power: 63
Send a message via ICQ to netytan Send a message via AIM to netytan Send a message via MSN to netytan Send a message via Yahoo to netytan
Quote:
Oops, I didn't want to open a Perl vs Python debate


Oh I don't think you did Julien. Telex is a big Python user, you seen QuickRip? All he's saying is that perl is better for string manipulation than Python, but then you already know that is what perl was made to do. Where Python is more general purpose (imo).

Mark.

Reply With Quote
  #12  
Old August 15th, 2003, 10:26 AM
telex4's Avatar
telex4 telex4 is offline
Wacky hack
Dev Shed Novice (500 - 999 posts)
 
Join Date: Apr 2001
Location: London, England
Posts: 512 telex4 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 h 25 m 29 sec
Reputation Power: 8
Heh, yes, I definitely prefer Python as a language. I'm in the middle of coding some extra features into a large Perl CGI solution I did for a client last summer, and blimey, is it a pain having to use Perl again

Reply With Quote
  #13  
Old August 16th, 2003, 01:43 AM
jeych jeych is offline
Junior Member
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Aug 2003
Posts: 6 jeych User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
OK, thank you for the link, I must confess I didn't know it (but I'm about to try it ).

Julien.

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming LanguagesPython Programming > file processing


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Linear Mode Linear Mode