#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2005
    Posts
    2
    Rep Power
    0

    Extract, parse..log files


    Hi,

    I've just started to learn python in the last day with the intention of creating something to parse a log file for a game a play. My end goal is to produce a XML file similar to the following:

    Code:
    <?xml version="1.0" ?> 
    <harvesting>
    	<mined>
    		<iron_ore>
    			<collected>451</collected>
    		</iron_ore>
    		<copper_ore>
    			<collected>1058</collected>
    		</copper_ore>
    	</mined>
    	<gathered>
    		<barley>
    			<collected>236</collected>
    		</barley>
    	</gathered>
    </harvesting>


    My games logfile is laid out as follows:

    Code:
    (1110268106)[Tue Mar 08 07:48:26 2005] You gathered a Sassafras from the ravaged natural herb garden. The ravaged natural herb garden looks slightly lessened.
    (1110268112)[Tue Mar 08 07:48:32 2005] You gathered a Milkweed from the ravaged natural herb garden. The ravaged natural herb garden looks slightly lessened.
    (1110268137)[Tue Mar 08 07:48:57 2005] You gathered a baubbleshire cabbage from the ravaged natural garden. The ravaged natural garden looks slightly lessened.
    (1110268144)[Tue Mar 08 07:49:04 2005] You gathered a baubbleshire cabbage from the ravaged natural garden. The ravaged natural garden looks slightly lessened.
    (1110268149)[Tue Mar 08 07:49:09 2005] You gathered a raw nutmeg from the ravaged natural garden. The ravaged natural garden looks slightly lessened.
    (1110268158)[Tue Mar 08 07:49:18 2005] You mined a rough malachite from the rustic stone. The rustic stone looks slightly lessened.
    (1110268165)[Tue Mar 08 07:49:25 2005] You mined a rough malachite from the rustic stone. The rustic stone looks slightly lessened.
    (1110268171)[Tue Mar 08 07:49:31 2005] You mined a lead cluster from the rustic stone. The rustic stone looks slightly lessened.
    (1110268183)[Tue Mar 08 07:49:43 2005] You mined a tin cluster from the rugged ore. The rugged ore looks slightly lessened.
    (1110268189)[Tue Mar 08 07:49:49 2005] You mined a tin cluster from the rugged ore. The rugged ore looks slightly lessened.
    (1110268195)[Tue Mar 08 07:49:55 2005] You mined a tin cluster from the rugged ore. The rugged ore looks slightly lessened.
    The only thing I am interested in atm is the name entries (tin cluster, raw nutmeg, Milkweed etc) and how many times that was gathered or mined.

    As this logfile is updated regularly I need the XML to be created the first time the script is run but there after the script should create new XML entry if the named (tin clusters etc) does not exist. If does exist then update the <collected></collected> entry.

    So far I've learned how to get python to open a file and display it's contents but I don't know how to get the script to examine each line and extract the data I want. I've look at things like .readlines() but this read eachline as a whole.

    If you help please help by posting something like "look at this (.StringSplit) in the manual to do this" etc. When ever I tried to this sort of thing before I end using somebody else's code and not actually learning it myself, I just want a push in the right direction

    Thanks

    Idaajed
  2. #2
  3. Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Dec 2004
    Location
    Meriden, Connecticut
    Posts
    1,797
    Rep Power
    154
    Here's an idea for looking for specific lines in a file.
    Code:
    for line in open ( 'filename.txt' ).readlines():
        print line
    With that code you can check to see if the line has the text you're looking for, you can do the following:
    Code:
    for line in open ( 'filename.txt' ).readlines():
        if 'tin copper' in line:
            print line
    And so on, I think you get the idea. I don't think you should try "adding" entries to your XML file. I'm currently using an XML file (controlled through Python) for a Battle.net channel monitor, where it keeps track (instaly updates the file and uploads it to an FTP server) of the channel name the bot is currently in, the user count, and each username and their product that is in the channel. However, in your case, you won't need near as much code as this, you should simply rewrite the entire file each time you need to and fix the entries accordingly. You can either do this manually or by using lists. By the way, is that game Eternal Lands?

    Comments on this post

    • SimonGreenhill agrees
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Nov 2003
    Posts
    624
    Rep Power
    34
    You can directly iterate over the lines in a file with just:

    Code:
    for line in file('filename.txt'):
        print line
    Which would be a good start for reading your logfile. It would give you one line at a time to work with. I'm sure you see the pattern of identifying what was gathered or mined:

    Code:
    (1110268106)[Tue Mar 08 07:48:26 2005] You gathered a ???? from the ravaged natural herb garden. The ravaged natural herb garden looks slightly lessened.
    You could either do the searching manually, with something like:
    line.find("You gathered a ")
    and look for the text up to "from the"

    or you could use line.split() and look at places in the file, but this would be more cumbersome as some items have spaces in their names.

    Or you could go for a regular expression, which would be possibly uglier, but perhaps useful here. I would be thinking something like:

    Code:
    ^.* You gathered a (.*) from the .*$
    With whatever the Python syntax for the grouping should be (I forget, but there is good module documentation which would tell you (module 're').

    Once you've identified what's been gathered, or what's been mined, you need to keep track of them.

    With some kind of data structure(s). Like, say, a dictionary. Or two.

    Hint:
    Code:
    >>> d = {}
    >>> d.get('cabbage')
    >>> d.get('cabbage', 0)
    0
    >>> d
    {}
    >>> d['cheese'] = d.get('cheese', 0) + 1
    >>> d
    {'cheese': 1}
    >>> d['cheese'] = d.get('cheese', 0) + 1
    >>> d
    {'cheese': 2}
    >>>


    As for the XML:

    Code:
    As this logfile is updated regularly I need the XML to be created the first time the script is run but there after the script should create new XML entry if the named (tin clusters etc) does not exist.
    Will the script be run against an ever-lengthening logfile, so wiping the XML file and creating a new one would do, or will it be run against new, different logfiles, so it would need to read the XML file as well?

    I don't know much about processing XML in Python - there seem to be as many various approaches as there are Python web frameworks ( ) - although Uche Ogbuji seems to think it's a great thing, and a lot of people seem to like Fredrik Lundh's modules.

    Comments on this post

    • SimonGreenhill agrees
    Last edited by sfb; March 24th, 2005 at 04:53 PM.
  6. #4
  7. Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Dec 2004
    Location
    Meriden, Connecticut
    Posts
    1,797
    Rep Power
    154
    I completely forgot about the find() function. idaajed, as sfb said, to use line.find(), that would be a much better way of finding data in a string that what I originally suggested.
  8. #5
  9. (retired)
    Devshed Supreme Being (6500+ posts)

    Join Date
    Dec 2003
    Location
    The Laboratory
    Posts
    10,101
    Rep Power
    0
    If you are just looking to update the XML file from the same logfile as the logfile grows, then you could also store the last line you analysed.

    Either this: 1110268195, or this: Tue Mar 08 07:49:55 2005 and use this to tell the program where to start reading from.
  10. #6
  11. Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Dec 2004
    Location
    Meriden, Connecticut
    Posts
    1,797
    Rep Power
    154
    I still don't see why you would want to just add data onto the end of a file or somewhere else inside of it. Simply rewriting it with the updated information seems to be a much quicker and easier task.
  12. #7
  13. (retired)
    Devshed Supreme Being (6500+ posts)

    Join Date
    Dec 2003
    Location
    The Laboratory
    Posts
    10,101
    Rep Power
    0
    Exactly - if you store where you finished parsing, then you can start re-reading from that point later and simply update the XML file.
  14. #8
  15. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2005
    Posts
    2
    Rep Power
    0

    Thanks for the headsup..


    Thanks for all the replies Lots of reading for me

    †Yegg† > EverQuestII <

IMN logo majestic logo threadwatch logo seochat tools logo