|
|
|||||||||
|
|||||||||
| |||||||||
|
|
|
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
#1
|
|||
|
|||
|
XML, DOM or XPATH parsing??
I have an xml file and want to parse it --->
############################################# Quote:
############################################# I have the following program --> ######################### Code:
from xml.dom.ext.reader.Sax2 import Reader
from xml.dom import Element
from xml.dom import TreeWalker
from xml import xpath
from Ft.Xml.XPath import Evaluate
from Ft.Xml.Domlette import NonvalidatingReader
from Ft.Lib import Uri
### Maak de xml reader
xml_bestand = 'D:/Python ontwikkel/xml_test/inputfields.xml'
PyXMLReader = Reader()
xml_reader = PyXMLReader.fromStream(xml_bestand)
def find_Fields(kolomname,xml_file):
file_uri = Uri.OsPathToUri(xml_file, attemptAbsolute=1)
doc = NonvalidatingReader.parseUri(file_uri)
xpath = "/input/inputdoc/fields[@id='" + kolomname + "']/text()"
fields = Evaluate(xpath,doc)
if fields:
return fields[0].data
nodeList = xml_reader.documentElement.childNodes
teller = 0
bijhoud_tel = 0
while teller < len(nodeList):
a = nodeList[teller]
if a.hasChildNodes():
bijhoud_tel += 1
input_files = a.getAttribute('dir').encode('ascii')
child_nodes = a.childNodes
print "Atrributes --> " + str(input_files)
print ''
child_tel = 0
bijhoud_child_tel = 0
while child_tel < len(child_nodes):
b = child_nodes[child_tel]
if b.hasChildNodes():
bijhoud_child_tel += 1
input_fields = b.getAttribute('id').encode('ascii')
print input_fields
child_tel += 1
teller += 1
########################### Now the question: What do I have to do to get the text between the fields tags???? I want a dictionary that says the following: {1:{col01,{seq : 0, fieldtype : s}}, {col02,{seq : 0, fieldtype : s}}, etc... Last edited by netytan : June 16th, 2004 at 12:06 PM. |
|
#2
|
||||
|
||||
|
Please read the Stickies on posting code.
grim ![]()
__________________
*** Experimental Python Markup CGI V2 *** |
|
#3
|
||||
|
||||
|
Maybe i can make a sugestion here though admitadly i have a lot to learn about XML, really must find an excuse to use it sometime.
Anyway, with such a simple file you can probably parse this more easily using regex, and, this way you dont have the over head of a full XML parser since what you want to do seems pretty simle. What ya think? Mark. |
|
#4
|
|||
|
|||
|
<quote>
parse this more easily using regex, </quote> ???????? What is regex? Maybe it's a solution. |
|
#5
|
||||
|
||||
|
It is Python's regular expression module similar to Perl's.
http://www.amk.ca/python/howto/regex/ http://www.regular-expressions.info/python.html You might find this XML link useful: http://pyxml.sourceforge.net/topics/docs.html grim ![]() Last edited by Grim Archon : June 17th, 2004 at 05:14 AM. |
|
#6
|
|||
|
|||
|
I have read a lot of docs, tutorials etc... but I still cann't find a solution to my problem. So is there someone who can tell me how I can make this program to work?
As you can see I, the only thing I cann't get is the text from the fields tags. |
|
#7
|
|||
|
|||
|
Code:
from xml.dom.ext.reader.Sax2 import Reader
from xml.dom import Element
from xml.dom import TreeWalker
from xml import xpath
from Ft.Xml.XPath import Evaluate
from Ft.Xml.Domlette import NonvalidatingReader
from Ft.Lib import Uri
### Maak de xml reader
xml_bestand = 'D:/Python ontwikkel/xml_test/inputfields.xml'
PyXMLReader = Reader()
xml_reader = PyXMLReader.fromStream(xml_bestand)
def find_Fields(xml_file):
file_uri = Uri.OsPathToUri(xml_file, attemptAbsolute=1)
doc = NonvalidatingReader.parseUri(file_uri)
xpath = "input/inputdoc"
docs = Evaluate(xpath,doc)
docs_tel = 0
aantal_docs = len(docs)
while docs_tel < aantal_docs:
doc_nr = docs[docs_tel]
elems = doc_nr
xpath = "attribute::*"
eval_tmp = Evaluate(xpath,docs[docs_tel])
print '*******************************************************'
print eval_tmp
print '*******************************************************'
docs_tel += 1
ret_fields = find_Fields(xml_bestand)
This almost work but I still cann't get the value. What I get is --> [<cAttr at 0136E490: name u'dir', value u'D:/Python ontwikkel/xml_test/bestanden/tst.txt'>] and the only thing I want is the value --> D:/Python ontwikkel/xml_test/bestanden/tst.txt. Why must this be so difficult?? I 'm looking for something like --> Code:
eval_tmp.getAttribute('dir')
|
|
#8
|
||||
|
||||
|
Well we did tell you it would be much easier with regular expessions and i would do this for you but i dont have Python on this machine which makes testing hard. Attach the xml file to this thread and i will see what i can do ok
![]() Mark. |
|
#9
|
|||
|
|||
|
netytan
I appriciate what you will do, many thanks for that. The problem is that my boss want to use xpath so that's why I didn't try it yet with regex. But I'm almost there with this program--> Code:
from xml.dom.ext.reader.Sax2 import Reader
from xml import xpath
from Ft.Xml.XPath import Evaluate
from Ft.Xml.Domlette import NonvalidatingReader
from Ft.Lib import Uri
### Maak de xml reader
xml_bestand = 'D:/Python ontwikkel/xml_test/inputfields.xml'
PyXMLReader = Reader()
xml_reader = PyXMLReader.fromStream(xml_bestand)
def find_Fields(xml_file):
file_uri = Uri.OsPathToUri(xml_file, attemptAbsolute=1)
doc = NonvalidatingReader.parseUri(file_uri)
xpath = "input/inputdoc"
docs = Evaluate(xpath,doc)
docs_tel = 0
aantal_docs = len(docs)
while docs_tel < aantal_docs:
doc_nr = docs[docs_tel]
elems = doc_nr
xpath = "concat(attribute::*)"
dir_waarde = Evaluate(xpath,docs[docs_tel])
print dir_waarde
xpath = "count(child::*)"
count_fields = Evaluate(xpath,docs[docs_tel])
print count_fields
n = 1
while n <= count_fields:
xpath = "concat(child::*/text())"
eval_tmp = Evaluate(xpath,docs[docs_tel])
print eval_tmp
n += 1
docs_tel += 1
ret_fields = find_Fields(xml_bestand)
The only thing what still doesn't work is my loop --> Code:
n = 1
while n <= count_fields:
xpath = "concat(child::*/text())"
eval_tmp = Evaluate(xpath,docs[docs_tel])
print eval_tmp
n += 1
I only get the first text of every fields tag. |
|
#10
|
|||
|
|||
|
BTW here is my xml
Code:
<?xml version = "1.0"?>
<input>
<inputdoc dir="D:/a/tst.txt">
<fields id="col01">seq : 0, fieldtype : s</fields>
<fields id="col02">seq : 1, fieldtype : s</fields>
<fields id="col03">seq : 2, fieldtype : s</fields>
<fields id="col04">seq : 3, fieldtype : s</fields>
<fields id="col05">seq : 4, fieldtype : s</fields>
<fields id="col06">seq : 5, fieldtype : s</fields>
</inputdoc>
<inputdoc dir="D:/a/bestanden/tst2.txt">
<fields id="col01">seq : 0, fieldtype : i</fields>
<fields id="col02">seq : 1, fieldtype : i</fields>
<fields id="col03">seq : 2, fieldtype : i</fields>
<fields id="col04">seq : 3, fieldtype : i</fields>
<fields id="col05">seq : 4, fieldtype : i</fields>
<fields id="col06">seq : 5, fieldtype : f</fields>
<fields id="col07">seq : 6, fieldtype : f</fields>
<fields id="col08">seq : 7, fieldtype : f</fields>
<fields id="col09">seq : 8, fieldtype : f</fields>
<fields id="col010">seq : 9, fieldtype : s</fields>
<fields id="col011">seq : 10, fieldtype : s</fields>
<fields id="col012">seq : 11, fieldtype : s</fields>
<fields id="col013">seq : 12, fieldtype : s</fields>
</inputdoc>
<inputdoc dir="D:/a/tst3.txt">
<fields id="col01">seq : 0, fieldtype : f</fields>
<fields id="col02">seq : 1, fieldtype : f</fields>
<fields id="col03">seq : 2, fieldtype : f</fields>
<fields id="col04">seq : 3, fieldtype : f</fields>
<fields id="col05">seq : 4, fieldtype : f</fields>
<fields id="col06">seq : 5, fieldtype : f</fields>
</inputdoc>
</input>
|
|
#11
|
|||
|
|||
|
No body, is this really hard to solve? If it's than I have an excus to stop with this?
|
|
#12
|
||||
|
||||
|
Its not that the problem itself is probably hard, its more the fact that the modules you are using are not part of the standard library - which limits the number of people out there who can help. This also means that there is not at much documentation on the modules in question
.Personally i would have tried it myself though only have remote access to Python right now, makeing working with file sized data anoying to say the least and dont have this module installed! If you attach the source file, then i could write a parser for you though this wouldn't use XPath for sure. Mark. |
![]() |
| Viewing: Dev Shed Forums > Programming Languages > Python Programming > XML, DOM or XPATH parsing?? |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|