#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2004
    Posts
    2
    Rep Power
    0

    Python Xml Parsing trouble :(


    Hi all, im newbie in py, searching the net, i cant find a clear explanation of how to do this, i read a lot of dom and sax but i can make this thing work hope you can help.

    I need to parse this xml file :

    <?xml version="1.0" encoding="iso-8859-1"?>
    <alert>
    <DateTime>2004-09-23 20:47:38</DateTime>
    <Name>TCP_Hijacking_Tool</Name>
    <Type>SuspiciousTCP<Type>
    <dstIP>10.125.5.15</dstIP>
    <srcIP>200.25.67.28</srcIP>
    </alert>

    So i can get this output:

    Name : TCP_Hijacking_tool
    Date&Time: 2004-09-23 20:47:38
    Destination IP : 10.125.5.15
    Source IP: 200.25.67.28

    I made a couple of py but doesnt work, the best results ive obtained, its changing the xml original file, i need to parse it without changing it.
    Someboooody saveeeeeeee meeee (like u2song) :P

    Cheers!
  2. #2
  3. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2004
    Posts
    27
    Rep Power
    0

    Quick and Dirty


    From what you wrote, I wasn't able to gather whether you wanted just a quick and dirty means of parsing, or whether you wanted to tap the XML offerings of Python. I am not well versed in XML in Python, so here's an example quick and dirty solution.

    A couple of things to note about the following code:
    1) Python dictionaries are not ordered, so the order in which things are parsed is not guaranteed. There are several simple ways to ensure your own ordering if you so desire/need.

    2) Regular expressions could also be used. By using RE's, you could actually find all the tags in the xml without having to know them in advance, and then see if they match known tags rather than searching for each known tag in the xml. The current solution is limited to known tags only.

    3) This code is by no means optimized. There are many tricks I refrained from using in order to keep it simple to demonstrate the basic implementation of the parsing operations.

    Code:
    teststr="""
    <?xml version="1.0" encoding="iso-8859-1"?>
    <alert>
    <DateTime>2004-09-23 20:47:38</DateTime>
    <Name>TCP_Hijacking_Tool</Name>
    <Type>SuspiciousTCP<Type>
    <dstIP>10.125.5.15</dstIP>
    <srcIP>200.25.67.28</srcIP>
    </alert>
    """
    
    parsedict = {
        "Name" : "Name",
        "DateTime" : "Date&Time",
        "Type" : "Type",
        "dstIP" : "Destination IP",
        "srcIP" : "Source IP"
        }
    
    def Parse( xml ):
        result = ""
        for tag in parsedict:
            open = '<' + tag + '>'
            close = '</' + tag + '>'
            start = xml.find( open )
            end = xml.find( close )
    
            if ( start != -1 ) and ( end != -1 ):
                result += parsedict[ tag ] + ' : ' + 
                            xml[ start+len( open ) : end ] + '\n'
    
        return result
    
    if __name__ == "__main__":
        print Parse( teststr )
    Hope it helps,

    Derrick
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2004
    Posts
    2
    Rep Power
    0
    Thanks all, we already fix it, hereis da code, hope it helps in future.

    xmldoc = parse("alerta.xml")

    molecule = xmldoc.childNodes[0]

    DateTime_node = molecule.getElementsByTagName("DateTime")[0]
    DateTime = DateTime_node.childNodes[0].data

    And repeat this lines for each tag, it works!!!

    Cheers !

IMN logo majestic logo threadwatch logo seochat tools logo