#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2004
    Posts
    7
    Rep Power
    0

    XML, DOM or XPATH parsing??


    I have an xml file and want to parse it --->
    #############################################

    <input>
    <inputdoc dir="D:/Python ontwikkel/bestanden/tst.txt">
    <fields id="col01">seq : 0, fieldtype : s</fields>
    <fields id="col02">seq : 1, fieldtype : s</fields>
    <fields id="col03">seq : 2, fieldtype : s</fields>
    <fields id="col04">seq : 3, fieldtype : s</fields>
    <fields id="col05">seq : 4, fieldtype : s</fields>
    <fields id="col06">seq : 5, fieldtype : s</fields>
    </inputdoc>
    <inputdoc dir="D:/Python ontwikkel/bestanden/tst2.txt">
    <fields id="col01">seq : 0, fieldtype : i</fields>
    <fields id="col02">seq : 1, fieldtype : i</fields>
    <fields id="col03">seq : 2, fieldtype : i</fields>
    <fields id="col04">seq : 3, fieldtype : i</fields>
    <fields id="col05">seq : 4, fieldtype : i</fields>
    <fields id="col06">seq : 5, fieldtype : f</fields>
    <fields id="col07">seq : 6, fieldtype : f</fields>
    <fields id="col08">seq : 7, fieldtype : f</fields>
    <fields id="col09">seq : 8, fieldtype : f</fields>
    <fields id="col010">seq : 9, fieldtype : s</fields>
    <fields id="col011">seq : 10, fieldtype : s</fields>
    <fields id="col012">seq : 11, fieldtype : s</fields>
    <fields id="col013">seq : 12, fieldtype : s</fields>
    </inputdoc>
    <inputdoc dir="D:/Python ontwikkel/bestanden/tst3.txt">
    <fields id="col01">seq : 0, fieldtype : f</fields>
    <fields id="col02">seq : 1, fieldtype : f</fields>
    <fields id="col03">seq : 2, fieldtype : f</fields>
    <fields id="col04">seq : 3, fieldtype : f</fields>
    <fields id="col05">seq : 4, fieldtype : f</fields>
    <fields id="col06">seq : 5, fieldtype : f</fields>
    </inputdoc>
    </input>
    #############################################
    I have the following program -->
    #########################
    Code:
    from xml.dom.ext.reader.Sax2 import Reader
    from xml.dom import Element
    from xml.dom import TreeWalker
    from xml import xpath
    from Ft.Xml.XPath import Evaluate
    from Ft.Xml.Domlette import NonvalidatingReader
    from Ft.Lib import Uri
    
    ### Maak de xml reader
    xml_bestand = 'D:/Python ontwikkel/xml_test/inputfields.xml'
    PyXMLReader = Reader()
    xml_reader = PyXMLReader.fromStream(xml_bestand)
    
    
    def find_Fields(kolomname,xml_file):
        file_uri = Uri.OsPathToUri(xml_file, attemptAbsolute=1)
        doc = NonvalidatingReader.parseUri(file_uri)
    
        xpath = "/input/inputdoc/fields[@id='" + kolomname + "']/text()"
        fields = Evaluate(xpath,doc)
    
        if fields:
            return fields[0].data
    
    nodeList = xml_reader.documentElement.childNodes
    
    teller = 0
    bijhoud_tel = 0
    while teller < len(nodeList):
        a = nodeList[teller]
    
        if a.hasChildNodes():
            bijhoud_tel += 1
            input_files = a.getAttribute('dir').encode('ascii')
            child_nodes = a.childNodes
    
            print "Atrributes --> " + str(input_files)
            print ''
    
            child_tel = 0
            bijhoud_child_tel = 0
            while child_tel < len(child_nodes):
                b = child_nodes[child_tel]
    
                if b.hasChildNodes():
                    bijhoud_child_tel += 1
                    input_fields = b.getAttribute('id').encode('ascii')
                    print input_fields
                
                child_tel += 1
        
        teller += 1
    ###########################
    Now the question:
    What do I have to do to get the text between the fields tags????
    I want a dictionary that says the following:

    {1:{col01,{seq : 0, fieldtype : s}}, {col02,{seq : 0, fieldtype : s}}, etc...
    Last edited by netytan; June 16th, 2004 at 11:06 AM.
  2. #2
  3. Mini me.
    Devshed Novice (500 - 999 posts)

    Join Date
    Nov 2003
    Location
    Cambridge, UK
    Posts
    783
    Rep Power
    13
    Please read the Stickies on posting code.

    grim
  4. #3
  5. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    Maybe i can make a sugestion here though admitadly i have a lot to learn about XML, really must find an excuse to use it sometime.

    Anyway, with such a simple file you can probably parse this more easily using regex, and, this way you dont have the over head of a full XML parser since what you want to do seems pretty simle.

    What ya think?

    Mark.
    programming language development: www.netytan.com Hula

  6. #4
  7. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2004
    Posts
    7
    Rep Power
    0
    <quote>
    parse this more easily using regex,
    </quote>

    ???????? What is regex? Maybe it's a solution.
  8. #5
  9. Mini me.
    Devshed Novice (500 - 999 posts)

    Join Date
    Nov 2003
    Location
    Cambridge, UK
    Posts
    783
    Rep Power
    13
    It is Python's regular expression module similar to Perl's.
    http://www.amk.ca/python/howto/regex/
    http://www.regular-expressions.info/python.html

    You might find this XML link useful:
    http://pyxml.sourceforge.net/topics/docs.html

    grim
    Last edited by Grim Archon; June 17th, 2004 at 04:14 AM.
  10. #6
  11. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2004
    Posts
    7
    Rep Power
    0
    I have read a lot of docs, tutorials etc... but I still cann't find a solution to my problem. So is there someone who can tell me how I can make this program to work?

    As you can see I, the only thing I cann't get is the text from the fields tags.
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2004
    Posts
    7
    Rep Power
    0
    Code:
    from xml.dom.ext.reader.Sax2 import Reader
    from xml.dom import Element
    from xml.dom import TreeWalker
    from xml import xpath
    from Ft.Xml.XPath import Evaluate
    from Ft.Xml.Domlette import NonvalidatingReader
    from Ft.Lib import Uri
    
    ### Maak de xml reader
    xml_bestand = 'D:/Python ontwikkel/xml_test/inputfields.xml'
    PyXMLReader = Reader()
    xml_reader = PyXMLReader.fromStream(xml_bestand)
    
    
    def find_Fields(xml_file):
        file_uri = Uri.OsPathToUri(xml_file, attemptAbsolute=1)
        doc = NonvalidatingReader.parseUri(file_uri)
        xpath = "input/inputdoc"
        docs = Evaluate(xpath,doc)
    
        docs_tel = 0
        aantal_docs = len(docs)
        while docs_tel < aantal_docs:
            doc_nr = docs[docs_tel]
            elems =  doc_nr
            xpath = "attribute::*"
            eval_tmp = Evaluate(xpath,docs[docs_tel])     
            print '*******************************************************'
            print eval_tmp
            print '*******************************************************'
            
            docs_tel += 1
    
    ret_fields = find_Fields(xml_bestand)
    This almost work but I still cann't get the value. What I get is --> [<cAttr at 0136E490: name u'dir', value u'D:/Python ontwikkel/xml_test/bestanden/tst.txt'>]

    and the only thing I want is the value --> D:/Python ontwikkel/xml_test/bestanden/tst.txt.

    Why must this be so difficult?? I 'm looking for something like -->

    Code:
    eval_tmp.getAttribute('dir')
  14. #8
  15. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    Well we did tell you it would be much easier with regular expessions and i would do this for you but i dont have Python on this machine which makes testing hard. Attach the xml file to this thread and i will see what i can do ok

    Mark.
    programming language development: www.netytan.com Hula

  16. #9
  17. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2004
    Posts
    7
    Rep Power
    0
    netytan

    I appriciate what you will do, many thanks for that. The problem is that my boss want to use xpath so that's why I didn't try it yet with regex.

    But I'm almost there with this program-->
    Code:
    from xml.dom.ext.reader.Sax2 import Reader
    from xml import xpath
    from Ft.Xml.XPath import Evaluate
    from Ft.Xml.Domlette import NonvalidatingReader
    from Ft.Lib import Uri
    
    ### Maak de xml reader
    xml_bestand = 'D:/Python ontwikkel/xml_test/inputfields.xml'
    PyXMLReader = Reader()
    xml_reader = PyXMLReader.fromStream(xml_bestand)
    
    
    def find_Fields(xml_file):
        file_uri = Uri.OsPathToUri(xml_file, attemptAbsolute=1)
        doc = NonvalidatingReader.parseUri(file_uri)
        xpath = "input/inputdoc"
        docs = Evaluate(xpath,doc)
    
    
        docs_tel = 0
        aantal_docs = len(docs)
        while docs_tel < aantal_docs:
            doc_nr = docs[docs_tel]
            elems =  doc_nr
            xpath = "concat(attribute::*)"
            dir_waarde = Evaluate(xpath,docs[docs_tel])
            print dir_waarde
    
            xpath = "count(child::*)"
            count_fields = Evaluate(xpath,docs[docs_tel])
            print count_fields
    
            n = 1
            while n <= count_fields:
                xpath = "concat(child::*/text())"
                eval_tmp = Evaluate(xpath,docs[docs_tel])
                print eval_tmp
                n += 1
                
            docs_tel += 1
    
    ret_fields = find_Fields(xml_bestand)
    The only thing what still doesn't work is my loop -->
    Code:
            n = 1
            while n <= count_fields:
                xpath = "concat(child::*/text())"
                eval_tmp = Evaluate(xpath,docs[docs_tel])
                print eval_tmp
                n += 1
    I only get the first text of every fields tag.
  18. #10
  19. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2004
    Posts
    7
    Rep Power
    0
    BTW here is my xml
    Code:
    <?xml version = "1.0"?>
    <input>
    	<inputdoc dir="D:/a/tst.txt">
        		<fields id="col01">seq : 0, fieldtype : s</fields>
    		<fields id="col02">seq : 1, fieldtype : s</fields>
    		<fields id="col03">seq : 2, fieldtype : s</fields>
    		<fields id="col04">seq : 3, fieldtype : s</fields>
    		<fields id="col05">seq : 4, fieldtype : s</fields>
    		<fields id="col06">seq : 5, fieldtype : s</fields>
      	</inputdoc>
    	<inputdoc dir="D:/a/bestanden/tst2.txt">
        		<fields id="col01">seq : 0, fieldtype : i</fields>
    		<fields id="col02">seq : 1, fieldtype : i</fields>
    		<fields id="col03">seq : 2, fieldtype : i</fields>
    		<fields id="col04">seq : 3, fieldtype : i</fields>
    		<fields id="col05">seq : 4, fieldtype : i</fields>
    		<fields id="col06">seq : 5, fieldtype : f</fields>
    		<fields id="col07">seq : 6, fieldtype : f</fields>
    		<fields id="col08">seq : 7, fieldtype : f</fields>
    		<fields id="col09">seq : 8, fieldtype : f</fields>
    		<fields id="col010">seq : 9, fieldtype : s</fields>
    		<fields id="col011">seq : 10, fieldtype : s</fields>
    		<fields id="col012">seq : 11, fieldtype : s</fields>
    		<fields id="col013">seq : 12, fieldtype : s</fields>
      	</inputdoc>
    	<inputdoc dir="D:/a/tst3.txt">
        		<fields id="col01">seq : 0, fieldtype : f</fields>
    		<fields id="col02">seq : 1, fieldtype : f</fields>
    		<fields id="col03">seq : 2, fieldtype : f</fields>
    		<fields id="col04">seq : 3, fieldtype : f</fields>
    		<fields id="col05">seq : 4, fieldtype : f</fields>
    		<fields id="col06">seq : 5, fieldtype : f</fields>
      	</inputdoc>
    </input>
  20. #11
  21. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2004
    Posts
    7
    Rep Power
    0
    No body, is this really hard to solve? If it's than I have an excus to stop with this?
  22. #12
  23. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    Its not that the problem itself is probably hard, its more the fact that the modules you are using are not part of the standard library - which limits the number of people out there who can help. This also means that there is not at much documentation on the modules in question .

    Personally i would have tried it myself though only have remote access to Python right now, makeing working with file sized data anoying to say the least and dont have this module installed!

    If you attach the source file, then i could write a parser for you though this wouldn't use XPath for sure.

    Mark.
    programming language development: www.netytan.com Hula


IMN logo majestic logo threadwatch logo seochat tools logo