Python Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me
Go Back   Dev Shed ForumsProgramming LanguagesPython Programming

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old June 23rd, 2004, 06:32 PM
Theeggman Theeggman is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: May 2001
Posts: 266 Theeggman User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 30 m 33 sec
Reputation Power: 8
Requesting comments on code to find last line in a file

I need to open a log file which will be thousands of lines long(current log is 26809 lines) and search for the last line to find out how the log file ended or if it is still in progress. I will be running this code on 2 files for possibly 10 machines every 15-30 minutes so I would like it to be fast and it shouldn't eat up a lot of memory. From what I read it seemed like file.seek() was the way to go but would appreciate any suggestions or comments. As usual v. 2.2.2

The code essentially moves to the end of the file with f.seek(0,2) then backs up until it hits a specified string. num is just a specified length of bytes that seek() can move backwards in the file.
Code:
...def file_seek(num):
...     f = open(myfile)
...     f.seek(0,2)
...     i = 0
...     while i > num:
...             line = f.readline()
...             if line.find("Build End:") >= 0:
...                     print line
...                     break
...             i = i - 1
...             f.seek(i,2)
...     print "done"
...
>>> file_seek(-1000)
Build End: Date: 06/23/04 Time: 08:35

done

Last edited by Theeggman : June 23rd, 2004 at 06:47 PM.

Reply With Quote
  #2  
Old June 24th, 2004, 02:54 AM
netytan's Avatar
netytan netytan is offline
Hello World :)
Dev Shed Frequenter (2500 - 2999 posts)
 
Join Date: Mar 2003
Location: Hull, UK
Posts: 2,536 netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 1 Week 2 Days 18 h 11 m 13 sec
Reputation Power: 63
Send a message via ICQ to netytan Send a message via AIM to netytan Send a message via MSN to netytan Send a message via Yahoo to netytan
A much easier, less memory intensive way to do this would be to use the built-in file itorator rather than a while loop, which doesn't read the whole file into memory, rather it pulls one line form the file at a time as requested by the loop. But you can still use seek() to optamize this further.

Sorry i dont have any time to comment your code right now but if noone later ill sort that out for you

Mark.
__________________
programming language development: www.netytan.com Hula


Reply With Quote
  #3  
Old June 24th, 2004, 10:33 AM
Theeggman Theeggman is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: May 2001
Posts: 266 Theeggman User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 30 m 33 sec
Reputation Power: 8
I didn't know that module existed thanks. Unfortunately not available in v. 2.2.2 of python. Even more unfortunate upgrading to the latest version of python is not an option.

E.

Reply With Quote
  #4  
Old June 24th, 2004, 11:31 AM
Grim Archon's Avatar
Grim Archon Grim Archon is offline
Mini me.
Dev Shed Novice (500 - 999 posts)
 
Join Date: Nov 2003
Location: Cambridge, UK
Posts: 783 Grim Archon User rank is Corporal (100 - 500 Reputation Level)Grim Archon User rank is Corporal (100 - 500 Reputation Level)Grim Archon User rank is Corporal (100 - 500 Reputation Level)Grim Archon User rank is Corporal (100 - 500 Reputation Level)  Folding Points: 1488 Folding Title: Novice Folder
Time spent in forums: 3 Days 2 h 15 m 57 sec
Reputation Power: 8
Send a message via MSN to Grim Archon
Here is something you might find also works:
Code:
import re
string_start = "STICHTING MATHEMATISCH"
string_end = "\n"

reline = re.compile(string_start+".*?"+string_end, re.DOTALL)

def file_seek(fname, num): 
    f = open(fname, 'r')
    f.seek(num, 2)
    text = f.read()
    f.close()
    ans = reline.search(text)
    if ans: 
        print "found"
        print ans.group(), 
    print "done"

file_seek("LICENSE.txt", -1000)

At any rate it is worth comparing.
My reason for doing it this way is that it minimizes file access and hopefully seek won't get too confused if the length is changed.

You could also consider tracking the files length between access then only seeking from the new file end to the old file end. You might want to have a small overlap on previous reads just in case the process writing the log file does not write in complete lines and you happen to be reading in the middle of the line you want.

You might consider a return value of True/False so that you can know when the line is detected.

grim
__________________
*** Experimental Python Markup CGI V2 ***

Reply With Quote
  #5  
Old June 24th, 2004, 12:48 PM
DevCoach DevCoach is offline
Contributing User
Dev Shed Beginner (1000 - 1499 posts)
 
Join Date: Feb 2004
Location: London, England
Posts: 1,254 DevCoach User rank is Captain (20000 - 30000 Reputation Level)DevCoach User rank is Captain (20000 - 30000 Reputation Level)DevCoach User rank is Captain (20000 - 30000 Reputation Level)DevCoach User rank is Captain (20000 - 30000 Reputation Level)DevCoach User rank is Captain (20000 - 30000 Reputation Level)DevCoach User rank is Captain (20000 - 30000 Reputation Level)DevCoach User rank is Captain (20000 - 30000 Reputation Level)DevCoach User rank is Captain (20000 - 30000 Reputation Level)DevCoach User rank is Captain (20000 - 30000 Reputation Level) 
Time spent in forums: 1 Week 6 Days 8 h 9 m
Reputation Power: 265
Do you know the maximum possible line length, or a size the you are comfortable will be greater than the line? Lets say that you know the lines are going to be less than 1000 characters. Then you could read the last 1000 characters into a list of lines using readlines, and get the last line from the list. i.e.

Code:
f = file(myfile)
f.seek(-1000, 2)
if 'text' in f.readlines()[-1]:
   #do stuff


This is slightly inefficient in that you are creating a list of the last few lines, so could be improved by reading each line and throwing it away:

Code:
f = file(myfile)
f.seek(-1000, 2)

for line in f: pass

# line now contains the last line
if 'text' in line:
   #do stuff...



Dave - The Developers' Coach

Last edited by DevCoach : June 24th, 2004 at 12:51 PM.

Reply With Quote
  #6  
Old June 24th, 2004, 04:48 PM
Theeggman Theeggman is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: May 2001
Posts: 266 Theeggman User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 30 m 33 sec
Reputation Power: 8
Code:
f = file(myfile)
f.seek(-1000, 2)
for line in f:
   #do stuff


I like the idea of reading in the last x bytes of the file and iterating through. But I have noticed that python reads '\n' and the text separately. So I get:

line1

line2

line3

etc

Reply With Quote
  #7  
Old June 25th, 2004, 02:00 AM
netytan's Avatar
netytan netytan is offline
Hello World :)
Dev Shed Frequenter (2500 - 2999 posts)
 
Join Date: Mar 2003
Location: Hull, UK
Posts: 2,536 netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 1 Week 2 Days 18 h 11 m 13 sec
Reputation Power: 63
Send a message via ICQ to netytan Send a message via AIM to netytan Send a message via MSN to netytan Send a message via Yahoo to netytan
Thats true, but you can always call the strip() method on the line to remove this. Another way to stop double spacing would be to use the print with a comma at the end i.e.

Code:
#!/usr/bin/env python

text = file('source.txt', 'r')
text.seek(-1000, 2)
for line in text:
    print line,


Assuming your code is working something like this then your problem should be solved .

Hope this helps,

Mark.

Reply With Quote
  #8  
Old June 25th, 2004, 02:02 AM
netytan's Avatar
netytan netytan is offline
Hello World :)
Dev Shed Frequenter (2500 - 2999 posts)
 
Join Date: Mar 2003
Location: Hull, UK
Posts: 2,536 netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 1 Week 2 Days 18 h 11 m 13 sec
Reputation Power: 63
Send a message via ICQ to netytan Send a message via AIM to netytan Send a message via MSN to netytan Send a message via Yahoo to netytan
You may also be able to inline the call to seek() which would give you something like this:

Code:
for line in file('source.txt').seek(-1000, 2): print line,


Although i havn't tested this yet, it would be nice if it did work .

Later,

Mark.

Reply With Quote
  #9  
Old June 25th, 2004, 03:44 AM
DevCoach DevCoach is offline
Contributing User
Dev Shed Beginner (1000 - 1499 posts)
 
Join Date: Feb 2004
Location: London, England
Posts: 1,254 DevCoach User rank is Captain (20000 - 30000 Reputation Level)DevCoach User rank is Captain (20000 - 30000 Reputation Level)DevCoach User rank is Captain (20000 - 30000 Reputation Level)DevCoach User rank is Captain (20000 - 30000 Reputation Level)DevCoach User rank is Captain (20000 - 30000 Reputation Level)DevCoach User rank is Captain (20000 - 30000 Reputation Level)DevCoach User rank is Captain (20000 - 30000 Reputation Level)DevCoach User rank is Captain (20000 - 30000 Reputation Level)DevCoach User rank is Captain (20000 - 30000 Reputation Level) 
Time spent in forums: 1 Week 6 Days 8 h 9 m
Reputation Power: 265
Quote:
Originally Posted by Theeggman
I like the idea of reading in the last x bytes of the file and iterating through. But I have noticed that python reads '\n' and the text separately. So I get:

line1

line2

line3

etc


The problem is not that it reads them separately, but when it reads a line it includes the \n at that end, so when you print it out with print line the print statements outputs another \n as well.

As netytan said, you can strip the \n off with line.strip().

Dave - The Developers' Coach

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming LanguagesPython Programming > Requesting comments on code to find last line in a file


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump


Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 





© 2003-2008 by Developer Shed. All rights reserved. DS Cluster 5 hosted by Hostway
Stay green...Green IT