#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2001
    Posts
    266
    Rep Power
    13

    Requesting comments on code to find last line in a file


    I need to open a log file which will be thousands of lines long(current log is 26809 lines) and search for the last line to find out how the log file ended or if it is still in progress. I will be running this code on 2 files for possibly 10 machines every 15-30 minutes so I would like it to be fast and it shouldn't eat up a lot of memory. From what I read it seemed like file.seek() was the way to go but would appreciate any suggestions or comments. As usual v. 2.2.2

    The code essentially moves to the end of the file with f.seek(0,2) then backs up until it hits a specified string. num is just a specified length of bytes that seek() can move backwards in the file.
    Code:
    ...def file_seek(num):
    ...     f = open(myfile)
    ...     f.seek(0,2)
    ...     i = 0
    ...     while i > num:
    ...             line = f.readline()
    ...             if line.find("Build End:") >= 0:
    ...                     print line
    ...                     break
    ...             i = i - 1
    ...             f.seek(i,2)
    ...     print "done"
    ...
    >>> file_seek(-1000)
    Build End: Date: 06/23/04 Time: 08:35
    
    done
    Last edited by Theeggman; June 23rd, 2004 at 05:47 PM.
  2. #2
  3. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    A much easier, less memory intensive way to do this would be to use the built-in file itorator rather than a while loop, which doesn't read the whole file into memory, rather it pulls one line form the file at a time as requested by the loop. But you can still use seek() to optamize this further.

    Sorry i dont have any time to comment your code right now but if noone later ill sort that out for you

    Mark.
    programming language development: www.netytan.com Hula

  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2001
    Posts
    266
    Rep Power
    13
    I didn't know that module existed thanks. Unfortunately not available in v. 2.2.2 of python. Even more unfortunate upgrading to the latest version of python is not an option.

    E.
  6. #4
  7. Mini me.
    Devshed Novice (500 - 999 posts)

    Join Date
    Nov 2003
    Location
    Cambridge, UK
    Posts
    783
    Rep Power
    13
    Here is something you might find also works:
    Code:
    import re
    string_start = "STICHTING MATHEMATISCH"
    string_end = "\n"
    
    reline = re.compile(string_start+".*?"+string_end, re.DOTALL)
    
    def file_seek(fname, num): 
        f = open(fname, 'r')
        f.seek(num, 2)
        text = f.read()
        f.close()
        ans = reline.search(text)
        if ans: 
            print "found"
            print ans.group(), 
        print "done"
    
    file_seek("LICENSE.txt", -1000)
    At any rate it is worth comparing.
    My reason for doing it this way is that it minimizes file access and hopefully seek won't get too confused if the length is changed.

    You could also consider tracking the files length between access then only seeking from the new file end to the old file end. You might want to have a small overlap on previous reads just in case the process writing the log file does not write in complete lines and you happen to be reading in the middle of the line you want.

    You might consider a return value of True/False so that you can know when the line is detected.

    grim
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Feb 2004
    Location
    London, England
    Posts
    1,585
    Rep Power
    1373
    Do you know the maximum possible line length, or a size the you are comfortable will be greater than the line? Lets say that you know the lines are going to be less than 1000 characters. Then you could read the last 1000 characters into a list of lines using readlines, and get the last line from the list. i.e.

    Code:
    f = file(myfile)
    f.seek(-1000, 2)
    if 'text' in f.readlines()[-1]:
       #do stuff
    This is slightly inefficient in that you are creating a list of the last few lines, so could be improved by reading each line and throwing it away:

    Code:
    f = file(myfile)
    f.seek(-1000, 2)
    
    for line in f: pass
    
    # line now contains the last line
    if 'text' in line:
       #do stuff...

    Dave - The Developers' Coach
    Last edited by DevCoach; June 24th, 2004 at 11:51 AM.
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2001
    Posts
    266
    Rep Power
    13
    Code:
    f = file(myfile)
    f.seek(-1000, 2)
    for line in f:
       #do stuff
    I like the idea of reading in the last x bytes of the file and iterating through. But I have noticed that python reads '\n' and the text separately. So I get:

    line1

    line2

    line3

    etc
  12. #7
  13. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    Thats true, but you can always call the strip() method on the line to remove this. Another way to stop double spacing would be to use the print with a comma at the end i.e.

    Code:
    #!/usr/bin/env python
    
    text = file('source.txt', 'r')
    text.seek(-1000, 2)
    for line in text:
        print line,
    Assuming your code is working something like this then your problem should be solved .

    Hope this helps,

    Mark.
    programming language development: www.netytan.com Hula

  14. #8
  15. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    You may also be able to inline the call to seek() which would give you something like this:

    Code:
    for line in file('source.txt').seek(-1000, 2): print line,
    Although i havn't tested this yet, it would be nice if it did work .

    Later,

    Mark.
    programming language development: www.netytan.com Hula

  16. #9
  17. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Feb 2004
    Location
    London, England
    Posts
    1,585
    Rep Power
    1373
    Originally Posted by Theeggman
    I like the idea of reading in the last x bytes of the file and iterating through. But I have noticed that python reads '\n' and the text separately. So I get:

    line1

    line2

    line3

    etc
    The problem is not that it reads them separately, but when it reads a line it includes the \n at that end, so when you print it out with print line the print statements outputs another \n as well.

    As netytan said, you can strip the \n off with line.strip().

    Dave - The Developers' Coach

IMN logo majestic logo threadwatch logo seochat tools logo