Python Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me
Go Back   Dev Shed ForumsProgramming LanguagesPython Programming

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old June 16th, 2004, 09:37 AM
TheLastHero TheLastHero is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jun 2004
Posts: 7 TheLastHero User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
XML, DOM or XPATH parsing??

I have an xml file and want to parse it --->
#############################################

Quote:
<input>
<inputdoc dir="D:/Python ontwikkel/bestanden/tst.txt">
<fields id="col01">seq : 0, fieldtype : s</fields>
<fields id="col02">seq : 1, fieldtype : s</fields>
<fields id="col03">seq : 2, fieldtype : s</fields>
<fields id="col04">seq : 3, fieldtype : s</fields>
<fields id="col05">seq : 4, fieldtype : s</fields>
<fields id="col06">seq : 5, fieldtype : s</fields>
</inputdoc>
<inputdoc dir="D:/Python ontwikkel/bestanden/tst2.txt">
<fields id="col01">seq : 0, fieldtype : i</fields>
<fields id="col02">seq : 1, fieldtype : i</fields>
<fields id="col03">seq : 2, fieldtype : i</fields>
<fields id="col04">seq : 3, fieldtype : i</fields>
<fields id="col05">seq : 4, fieldtype : i</fields>
<fields id="col06">seq : 5, fieldtype : f</fields>
<fields id="col07">seq : 6, fieldtype : f</fields>
<fields id="col08">seq : 7, fieldtype : f</fields>
<fields id="col09">seq : 8, fieldtype : f</fields>
<fields id="col010">seq : 9, fieldtype : s</fields>
<fields id="col011">seq : 10, fieldtype : s</fields>
<fields id="col012">seq : 11, fieldtype : s</fields>
<fields id="col013">seq : 12, fieldtype : s</fields>
</inputdoc>
<inputdoc dir="D:/Python ontwikkel/bestanden/tst3.txt">
<fields id="col01">seq : 0, fieldtype : f</fields>
<fields id="col02">seq : 1, fieldtype : f</fields>
<fields id="col03">seq : 2, fieldtype : f</fields>
<fields id="col04">seq : 3, fieldtype : f</fields>
<fields id="col05">seq : 4, fieldtype : f</fields>
<fields id="col06">seq : 5, fieldtype : f</fields>
</inputdoc>
</input>

#############################################
I have the following program -->
#########################
Code:
from xml.dom.ext.reader.Sax2 import Reader
from xml.dom import Element
from xml.dom import TreeWalker
from xml import xpath
from Ft.Xml.XPath import Evaluate
from Ft.Xml.Domlette import NonvalidatingReader
from Ft.Lib import Uri

### Maak de xml reader
xml_bestand = 'D:/Python ontwikkel/xml_test/inputfields.xml'
PyXMLReader = Reader()
xml_reader = PyXMLReader.fromStream(xml_bestand)


def find_Fields(kolomname,xml_file):
    file_uri = Uri.OsPathToUri(xml_file, attemptAbsolute=1)
    doc = NonvalidatingReader.parseUri(file_uri)

    xpath = "/input/inputdoc/fields[@id='" + kolomname + "']/text()"
    fields = Evaluate(xpath,doc)

    if fields:
        return fields[0].data

nodeList = xml_reader.documentElement.childNodes

teller = 0
bijhoud_tel = 0
while teller < len(nodeList):
    a = nodeList[teller]

    if a.hasChildNodes():
        bijhoud_tel += 1
        input_files = a.getAttribute('dir').encode('ascii')
        child_nodes = a.childNodes

        print "Atrributes --> " + str(input_files)
        print ''

        child_tel = 0
        bijhoud_child_tel = 0
        while child_tel < len(child_nodes):
            b = child_nodes[child_tel]

            if b.hasChildNodes():
                bijhoud_child_tel += 1
                input_fields = b.getAttribute('id').encode('ascii')
                print input_fields
            
            child_tel += 1
    
    teller += 1


###########################
Now the question:
What do I have to do to get the text between the fields tags????
I want a dictionary that says the following:

{1:{col01,{seq : 0, fieldtype : s}}, {col02,{seq : 0, fieldtype : s}}, etc...

Last edited by netytan : June 16th, 2004 at 12:06 PM.

Reply With Quote
  #2  
Old June 16th, 2004, 09:42 AM
Grim Archon's Avatar
Grim Archon Grim Archon is offline
Mini me.
Dev Shed Novice (500 - 999 posts)
 
Join Date: Nov 2003
Location: Cambridge, UK
Posts: 783 Grim Archon User rank is Corporal (100 - 500 Reputation Level)Grim Archon User rank is Corporal (100 - 500 Reputation Level)Grim Archon User rank is Corporal (100 - 500 Reputation Level)Grim Archon User rank is Corporal (100 - 500 Reputation Level)  Folding Points: 1488 Folding Title: Novice Folder
Time spent in forums: 3 Days 2 h 15 m 57 sec
Reputation Power: 8
Send a message via MSN to Grim Archon
Please read the Stickies on posting code.

grim
__________________
*** Experimental Python Markup CGI V2 ***

Reply With Quote
  #3  
Old June 16th, 2004, 12:09 PM
netytan's Avatar
netytan netytan is offline
Hello World :)
Dev Shed Frequenter (2500 - 2999 posts)
 
Join Date: Mar 2003
Location: Hull, UK
Posts: 2,536 netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 1 Week 2 Days 18 h 10 m 32 sec
Reputation Power: 63
Send a message via ICQ to netytan Send a message via AIM to netytan Send a message via MSN to netytan Send a message via Yahoo to netytan
Maybe i can make a sugestion here though admitadly i have a lot to learn about XML, really must find an excuse to use it sometime.

Anyway, with such a simple file you can probably parse this more easily using regex, and, this way you dont have the over head of a full XML parser since what you want to do seems pretty simle.

What ya think?

Mark.
__________________
programming language development: www.netytan.com Hula


Reply With Quote
  #4  
Old June 17th, 2004, 01:59 AM
TheLastHero TheLastHero is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jun 2004
Posts: 7 TheLastHero User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
<quote>
parse this more easily using regex,
</quote>

???????? What is regex? Maybe it's a solution.

Reply With Quote
  #5  
Old June 17th, 2004, 05:08 AM
Grim Archon's Avatar
Grim Archon Grim Archon is offline
Mini me.
Dev Shed Novice (500 - 999 posts)
 
Join Date: Nov 2003
Location: Cambridge, UK
Posts: 783 Grim Archon User rank is Corporal (100 - 500 Reputation Level)Grim Archon User rank is Corporal (100 - 500 Reputation Level)Grim Archon User rank is Corporal (100 - 500 Reputation Level)Grim Archon User rank is Corporal (100 - 500 Reputation Level)  Folding Points: 1488 Folding Title: Novice Folder
Time spent in forums: 3 Days 2 h 15 m 57 sec
Reputation Power: 8
Send a message via MSN to Grim Archon
It is Python's regular expression module similar to Perl's.
http://www.amk.ca/python/howto/regex/
http://www.regular-expressions.info/python.html

You might find this XML link useful:
http://pyxml.sourceforge.net/topics/docs.html

grim

Last edited by Grim Archon : June 17th, 2004 at 05:14 AM.

Reply With Quote
  #6  
Old June 17th, 2004, 06:21 AM
TheLastHero TheLastHero is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jun 2004
Posts: 7 TheLastHero User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
I have read a lot of docs, tutorials etc... but I still cann't find a solution to my problem. So is there someone who can tell me how I can make this program to work?

As you can see I, the only thing I cann't get is the text from the fields tags.

Reply With Quote
  #7  
Old June 17th, 2004, 08:10 AM
TheLastHero TheLastHero is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jun 2004
Posts: 7 TheLastHero User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Code:
from xml.dom.ext.reader.Sax2 import Reader
from xml.dom import Element
from xml.dom import TreeWalker
from xml import xpath
from Ft.Xml.XPath import Evaluate
from Ft.Xml.Domlette import NonvalidatingReader
from Ft.Lib import Uri

### Maak de xml reader
xml_bestand = 'D:/Python ontwikkel/xml_test/inputfields.xml'
PyXMLReader = Reader()
xml_reader = PyXMLReader.fromStream(xml_bestand)


def find_Fields(xml_file):
    file_uri = Uri.OsPathToUri(xml_file, attemptAbsolute=1)
    doc = NonvalidatingReader.parseUri(file_uri)
    xpath = "input/inputdoc"
    docs = Evaluate(xpath,doc)

    docs_tel = 0
    aantal_docs = len(docs)
    while docs_tel < aantal_docs:
        doc_nr = docs[docs_tel]
        elems =  doc_nr
        xpath = "attribute::*"
        eval_tmp = Evaluate(xpath,docs[docs_tel])     
        print '*******************************************************'
        print eval_tmp
        print '*******************************************************'
        
        docs_tel += 1

ret_fields = find_Fields(xml_bestand)


This almost work but I still cann't get the value. What I get is --> [<cAttr at 0136E490: name u'dir', value u'D:/Python ontwikkel/xml_test/bestanden/tst.txt'>]

and the only thing I want is the value --> D:/Python ontwikkel/xml_test/bestanden/tst.txt.

Why must this be so difficult?? I 'm looking for something like -->

Code:
eval_tmp.getAttribute('dir')

Reply With Quote
  #8  
Old June 17th, 2004, 08:42 AM
netytan's Avatar
netytan netytan is offline
Hello World :)
Dev Shed Frequenter (2500 - 2999 posts)
 
Join Date: Mar 2003
Location: Hull, UK
Posts: 2,536 netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 1 Week 2 Days 18 h 10 m 32 sec
Reputation Power: 63
Send a message via ICQ to netytan Send a message via AIM to netytan Send a message via MSN to netytan Send a message via Yahoo to netytan
Well we did tell you it would be much easier with regular expessions and i would do this for you but i dont have Python on this machine which makes testing hard. Attach the xml file to this thread and i will see what i can do ok

Mark.

Reply With Quote
  #9  
Old June 17th, 2004, 09:24 AM
TheLastHero TheLastHero is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jun 2004
Posts: 7 TheLastHero User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
netytan

I appriciate what you will do, many thanks for that. The problem is that my boss want to use xpath so that's why I didn't try it yet with regex.

But I'm almost there with this program-->
Code:
from xml.dom.ext.reader.Sax2 import Reader
from xml import xpath
from Ft.Xml.XPath import Evaluate
from Ft.Xml.Domlette import NonvalidatingReader
from Ft.Lib import Uri

### Maak de xml reader
xml_bestand = 'D:/Python ontwikkel/xml_test/inputfields.xml'
PyXMLReader = Reader()
xml_reader = PyXMLReader.fromStream(xml_bestand)


def find_Fields(xml_file):
    file_uri = Uri.OsPathToUri(xml_file, attemptAbsolute=1)
    doc = NonvalidatingReader.parseUri(file_uri)
    xpath = "input/inputdoc"
    docs = Evaluate(xpath,doc)


    docs_tel = 0
    aantal_docs = len(docs)
    while docs_tel < aantal_docs:
        doc_nr = docs[docs_tel]
        elems =  doc_nr
        xpath = "concat(attribute::*)"
        dir_waarde = Evaluate(xpath,docs[docs_tel])
        print dir_waarde

        xpath = "count(child::*)"
        count_fields = Evaluate(xpath,docs[docs_tel])
        print count_fields

        n = 1
        while n <= count_fields:
            xpath = "concat(child::*/text())"
            eval_tmp = Evaluate(xpath,docs[docs_tel])
            print eval_tmp
            n += 1
            
        docs_tel += 1

ret_fields = find_Fields(xml_bestand)


The only thing what still doesn't work is my loop -->
Code:
        n = 1
        while n <= count_fields:
            xpath = "concat(child::*/text())"
            eval_tmp = Evaluate(xpath,docs[docs_tel])
            print eval_tmp
            n += 1


I only get the first text of every fields tag.

Reply With Quote
  #10  
Old June 17th, 2004, 09:27 AM
TheLastHero TheLastHero is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jun 2004
Posts: 7 TheLastHero User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
BTW here is my xml
Code:
<?xml version = "1.0"?>
<input>
	<inputdoc dir="D:/a/tst.txt">
    		<fields id="col01">seq : 0, fieldtype : s</fields>
		<fields id="col02">seq : 1, fieldtype : s</fields>
		<fields id="col03">seq : 2, fieldtype : s</fields>
		<fields id="col04">seq : 3, fieldtype : s</fields>
		<fields id="col05">seq : 4, fieldtype : s</fields>
		<fields id="col06">seq : 5, fieldtype : s</fields>
  	</inputdoc>
	<inputdoc dir="D:/a/bestanden/tst2.txt">
    		<fields id="col01">seq : 0, fieldtype : i</fields>
		<fields id="col02">seq : 1, fieldtype : i</fields>
		<fields id="col03">seq : 2, fieldtype : i</fields>
		<fields id="col04">seq : 3, fieldtype : i</fields>
		<fields id="col05">seq : 4, fieldtype : i</fields>
		<fields id="col06">seq : 5, fieldtype : f</fields>
		<fields id="col07">seq : 6, fieldtype : f</fields>
		<fields id="col08">seq : 7, fieldtype : f</fields>
		<fields id="col09">seq : 8, fieldtype : f</fields>
		<fields id="col010">seq : 9, fieldtype : s</fields>
		<fields id="col011">seq : 10, fieldtype : s</fields>
		<fields id="col012">seq : 11, fieldtype : s</fields>
		<fields id="col013">seq : 12, fieldtype : s</fields>
  	</inputdoc>
	<inputdoc dir="D:/a/tst3.txt">
    		<fields id="col01">seq : 0, fieldtype : f</fields>
		<fields id="col02">seq : 1, fieldtype : f</fields>
		<fields id="col03">seq : 2, fieldtype : f</fields>
		<fields id="col04">seq : 3, fieldtype : f</fields>
		<fields id="col05">seq : 4, fieldtype : f</fields>
		<fields id="col06">seq : 5, fieldtype : f</fields>
  	</inputdoc>
</input>

Reply With Quote
  #11  
Old June 21st, 2004, 02:23 AM
TheLastHero TheLastHero is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jun 2004
Posts: 7 TheLastHero User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
No body, is this really hard to solve? If it's than I have an excus to stop with this?

Reply With Quote
  #12  
Old June 21st, 2004, 09:42 AM
netytan's Avatar
netytan netytan is offline
Hello World :)
Dev Shed Frequenter (2500 - 2999 posts)
 
Join Date: Mar 2003
Location: Hull, UK
Posts: 2,536 netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 1 Week 2 Days 18 h 10 m 32 sec
Reputation Power: 63
Send a message via ICQ to netytan Send a message via AIM to netytan Send a message via MSN to netytan Send a message via Yahoo to netytan
Its not that the problem itself is probably hard, its more the fact that the modules you are using are not part of the standard library - which limits the number of people out there who can help. This also means that there is not at much documentation on the modules in question .

Personally i would have tried it myself though only have remote access to Python right now, makeing working with file sized data anoying to say the least and dont have this module installed!

If you attach the source file, then i could write a parser for you though this wouldn't use XPath for sure.

Mark.

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming LanguagesPython Programming > XML, DOM or XPATH parsing??


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump