Discuss XML parsing with expat empty tags in the Python Programming forum on Dev Shed. XML parsing with expat empty tags Python Programming forum discussing coding techniques, tips and tricks, and Zope related information. Python was designed from the ground up to be a completely object-oriented programming language.
Receive the tools necessary to be the rock star of your field. Our 12-month program teaches you the evolving world of multi-channel marketing as well as the complex issues and opportunities found in the industry.
ASP Free and Iron Speed Designer are giving away $5,500+ in FREE licenses. Iron Speed's RAD CASE toolset can save up to 80% of your coding time. One free license per week, one perpetual license per month! Download and Activate to enter!
Web development can be a daunting task, even for specialists. There is a lot of information to absorb and a lot of technologies to learn in order to manage a superior website. When trying to learn the ropes, developers need a reliable source to introduce new ideas that can be easily implemented. When working on large projects, even web veterans may run into a technology or an aspect of a technology that they are unfamiliar with.
Posts: 1,075
Time spent in forums: 4 Weeks 1 Day 4 h 41 m 27 sec
Reputation Power: 98
Works correctly on my system.
An empty xml tag looks like either
<postal></postal>
or
<postal/>
And in an content-free tag you can specify attributes.
There aren't any attributes in your example.
OK, now that I've begun to catch up to you we need to find out what xml should do. Based on my 3 minute, or shorter, scan of the xml project home page I would think that expat should call your start and end handlers:
Posts: 5
Time spent in forums: 53 m 52 sec
Reputation Power: 0
Thanks for your prompt response and help ...
I fixed my issue after all , it wasn't an issue within the expat module
but in class redefined inheriting form it ...
Posts: 1,075
Time spent in forums: 4 Weeks 1 Day 4 h 41 m 27 sec
Reputation Power: 98
Works. Cannot duplicate your trouble.
Hello! It does find the empty tags. I made the seemingly passive changes to your code shown here:
Code:
...
name = 'self._%sEndTag'%(name,)
func = None
print('DWL: END ELEMENT '+name) #inserted this line into EndElement
try:
func = eval(name)
...
# append to the end of your file which I named XML.py
with open('example.xml','w') as f:
f.write('''<?xml version='1.0'?>
<parent id='top'><child1 name='paul'>Text goes here</child1>
<postal/>
<postal></postal>
<child2 name='fred'>More text</child2>
</parent>
''')
import logging
p = XMLParser(logging.Logger('dwl'))
p.ParseFile(open('example.xml','r'))
And then evaluate XML.py , showing the 2 postal start and 2 postal end tags are found.
Posts: 5
Time spent in forums: 53 m 52 sec
Reputation Power: 0
Your help is much appreciated ...
Seems right but for some reason my raw output is still missing the ones with the empty tags ...
May be u can take a look at the following ADF files and see if anything seems Obvious ?
## adf.py class
Code:
# -*- coding: latin-1 -*-
"""
ADF Object used to represent an ADF XML file
Uses ADFParser to parse the XML file (x.leads.adf.parser).
Uses ADFGenerator to output itself as an ADF XML (from x.leads.adf.generator)
"""
from x.leads.adf.classes import ADFNode, ADFBlock
from x.leads.adf.classes import ADFProspect, ADFVehicle, ADFCustomer
from x.leads.adf.classes import ADFVendor, ADFProvider
from x.leads.adf.classes import ADFContact, ADFContactAddress
from x.leads.adf.classes import ADFCustomerTimeframe
from x.leads.adf.classes import FIELDS_PRICE
from x.leads.adf.generator import ADFGenerator
from x.leads.adf.parser import ADFParser
class ADFValidationError(Exception):
pass
class ADF(object):
"""ADF Object containing objects for the following nodes:
Prospect, Vehicle, Customer, Vendor, Provider"""
def __init__(self, fp):
"""Initialize the object.
fp is a file-like object that has a read() method to consume
the contents."""
# Initialize the most basic portion of the object
self.raw_object = None
self.prospects = []
if fp:
self._parse(fp)
def to_xml(self):
"""Convert the ADF object to XML.
Returns XML (string)"""
return ADFGenerator(self).to_xml()
def to_dict(self, prospectidx=0):
"""Returns a dictionary representation of the ADF object.
This is used strictly for webservices, so the representation
is based on the requirements of the webservice."""
return ADFGenerator(self).to_dict(prospectidx)
@classmethod
def _set_id_data(cls, obj, item):
"""Support Method
Set the data related to the ID node.
item must be an ADFNode object, or a list containing
ADFNode objects.
"""
# Only process the first id if we have a list
# This may need fixing in the future as we may need to
# better target the id we're trying to set
if isinstance(item, list) and item:
item = item[0]
# Make sure this is an ADFNode
if isinstance(item, ADFNode):
obj.id = item.value
obj.id_sequence = item.attributes.get("sequence")
obj.id_source = item.attributes.get("source")
return
@classmethod
def _treat_elem(self, dst, src, property=None):
"""Support Method
Transfers data from the src object to the destination object.
The property value identifies which specific property we are
looking to transfer.
The src object may be an ADFNode object or a dictionary"""
# Is dst a dictionary?
if isinstance(dst, dict):
# Yes
dest = dst
else:
# No, obtain the object's __dict__
# An exception will bubble up if this is not an Object
# with __dict__
dest = dst.__dict__
if not property:
# The property was not specified.
# This means that we must work on the src object itself
# The src object must be an ADFNode or ADFBlock object
if isinstance(src, (ADFNode, ADFBlock)):
# Transfer the attributes from src to dest
for k, v in src.attributes.iteritems():
key = "_" + k
if isinstance(key, unicode):
key = key.encode("latin-1")
dest[key] = v
else:
# The property was specified.
# This means that we need to obtain the node (property)
# from the src object
node = src.__dict__.get(property)
# Validate that this is an ADFNode object
if isinstance(node, ADFNode):
dest[property] = node.value
# Transfer the attributes from src to dest
for k, v in node.attributes.iteritems():
key = property + "_" + k
if isinstance(key, unicode):
key = key.encode("latin-1")
dest[key] = v
return
def _parse(self, fp):
"""Support Method
Reads the input file and parses the XML"""
# Parse the incoming data (fp)
self.raw_object = adf = ADFParser()
adf.ParseFile(fp)
# Validate the parsed data
try:
#adf._validate_adf()
pass
except Exception, e:
raise ADFValidationError(str(e))
# Process the prospects (first level)
self._process_prospects(adf)
return
def _process_prospects(self, adf):
"""Support Method
Processes the 'prospect' block"""
# Run through the data and parse the elements we need to build our object's properties
for adf_prospect in adf.prospect_list:
if isinstance(adf_prospect, ADFBlock):
prospect = ADFProspect()
self.prospects.append(prospect)
ADF._set_id_data(prospect, adf_prospect.id_list)
ADF._treat_elem(prospect, adf_prospect)
ADF._treat_elem(prospect, adf_prospect, "requestdate")
# Process the Vehicles
self._process_vehicles(prospect, adf_prospect.vehicle_list)
# Process the vendor
if 'vendor' in adf_prospect.__dict__:
self._process_vendor(prospect, adf_prospect.vendor)
# Process the provider
if 'provider' in adf_prospect.__dict__:
self._process_provider(prospect, adf_prospect.provider)
# Process the customer
if 'customer' in adf_prospect.__dict__:
self._process_customer(prospect, adf_prospect.customer)
return
def _process_vehicles(self, prospect, vehicle_list):
"""Support Method
Processes the 'vehicle' block"""
# This prospect begins with an empty list of vehicles
prospect.vehicles = []
for adf_vehicle in vehicle_list:
vehicle = ADFVehicle()
prospect.vehicles.append(vehicle)
# Set the vehicle data
ADF._set_id_data(vehicle, adf_vehicle.id_list)
ADF._treat_elem(vehicle, adf_vehicle)
ADF._treat_elem(vehicle, adf_vehicle, "year")
ADF._treat_elem(vehicle, adf_vehicle, "make")
ADF._treat_elem(vehicle, adf_vehicle, "model")
ADF._treat_elem(vehicle, adf_vehicle, "vin")
ADF._treat_elem(vehicle, adf_vehicle, "stock")
ADF._treat_elem(vehicle, adf_vehicle, "trim")
ADF._treat_elem(vehicle, adf_vehicle, "doors")
ADF._treat_elem(vehicle, adf_vehicle, "bodystyle")
ADF._treat_elem(vehicle, adf_vehicle, "transmission")
ADF._treat_elem(vehicle, adf_vehicle, "odometer")
ADF._treat_elem(vehicle, adf_vehicle, "condition")
ADF._treat_elem(vehicle, adf_vehicle, "imagetag")
ADF._treat_elem(vehicle, adf_vehicle, "price")
ADF._treat_elem(vehicle, adf_vehicle, "pricecomments")
ADF._treat_elem(vehicle, adf_vehicle, "comments")
# Process colorcombinations
self._process_vehicle_colorcombinations(vehicle, adf_vehicle.colorcombination_list)
# Process options
self._process_vehicle_options(vehicle, adf_vehicle.option_list)
# Process finance
finance = adf_vehicle.__dict__.get("finance")
if finance:
self._process_vehicle_finance(vehicle, adf_vehicle.finance)
return
def _process_vehicle_colorcombinations(self, vehicle, colorcombinations):
"""Support Method
Processes the 'colorcombination' block"""
# This vehicle is assigned a color combination list
vehicle.colorcombinations = []
for combination in colorcombinations:
if isinstance(combination, ADFBlock):
block = {}
fields = ["preference", "interiorcolor", "exteriorcolor"]
for field in fields:
block[field] = None
for field in fields:
ADF._treat_elem(block, combination, field)
# Add it to the combinations
vehicle.colorcombinations.append(block)
# Sort the combinations
vehicle.colorcombinations.sort(ADF._process_vehicle_colorcombinations_sort)
return
@classmethod
def _generic_int_sort(cls, a, b, field, reversed=False):
"""Support Method
Generic method to sort based on an int property (field) that
belongs to a and b. The results may be sorted in reverse
using reversed=True"""
# Obtain the ints to sort by
a_val = int(a.get(field, 0))
b_val = int(b.get(field, 0))
# Does the user want to reverse the order?
if reversed:
return b_val - a_val
return a_val - b_val
@classmethod
def _process_vehicle_colorcombinations_sort(cls, a, b):
"""Support Method
Sort method for the color combinations"""
# color combinations are sorted via the "preference" field.
# This is a required field so we know it will exist
return cls._generic_int_sort(a, b, "preference")
def _process_vehicle_options(self, vehicle, options):
"""Support Method
Processes the vehicle's 'option' block"""
# This vehicle is assigned an options list
vehicle.options = []
for option in options:
if isinstance(option, ADFBlock):
block = {}
fields = ["optionname", "manufacturercode", "stock", "weighting"] + FIELDS_PRICE
for field in fields:
block[field] = None
for field in fields:
ADF._treat_elem(block, option, field)
vehicle.options.append(block)
# Sort the options
vehicle.options.sort(ADF._process_vehicle_options_sort)
return
@classmethod
def _process_vehicle_options_sort(cls, a, b):
"""Support Method
Sort method for the vehicle options"""
# The options are ordered using the weighting field
return cls._generic_int_sort(a, b, "weighting", reversed=True)
def _process_vehicle_finance(self, vehicle, finance):
"""Support Method
Processes the vehicle`s 'finance' block"""
# Create a new amount list and transfer the finance data to the block
amount_list = []
block = dict(amounts=amount_list)
ADF._treat_elem(block, finance)
# Transfer the data from the 2 possible child options
fields = ["method", "balance"]
for field in fields:
block[field] = None
for field in fields:
ADF._treat_elem(block, finance, field)
# Transfer all the amounts (ADF has 1 or more)
for amount in finance.amount_list:
amt_block = dict(value=amount.value)
ADF._treat_elem(amt_block, amount)
amount_list.append(amt_block)
# Set the vehicle's finance block
vehicle.finance = block
return
def _process_customer(self, prospect, adf_customer):
"""Support Method
Processes the 'customer' block"""
# Create the customer object
customer = prospect.customer = ADFCustomer()
# Transfer the data to the object
ADF._set_id_data(customer, adf_customer.id_list)
ADF._treat_elem(customer, adf_customer)
ADF._treat_elem(customer, adf_customer, "comments")
# Create the contact object
customer.contact = ADFContact()
self._process_contact_block(customer.contact, adf_customer)
# Is the timeframe block defined?
if adf_customer.__dict__.has_key('timeframe'):
# Create the timeframe object
timeframe = customer.timeframe = ADFCustomerTimeframe()
# Transfer the timeframe information
ADF._treat_elem(timeframe, adf_customer.timeframe)
ADF._treat_elem(timeframe, adf_customer.timeframe, "description")
ADF._treat_elem(timeframe, adf_customer.timeframe, "earliestdate")
ADF._treat_elem(timeframe, adf_customer.timeframe, "latestdate")
return
@classmethod
def _address_street_sort(cls, a, b):
"""Support Method
Sort method for the address' 'street' tag"""
# The street data is sorted using the "line" attribute
return cls._generic_int_sort(a, b, "_line")
def _process_vendor(self, prospect, adf_vendor):
"""Support Method
Processes the 'vendor' block"""
# Create a new vendor object
vendor = prospect.vendor = ADFVendor()
# Set the data
ADF._set_id_data(vendor, adf_vendor.id_list)
ADF._treat_elem(vendor, adf_vendor, "vendorname")
ADF._treat_elem(vendor, adf_vendor, "url")
# Create the contact object and process it
vendor.contact = ADFContact()
self._process_contact_block(vendor.contact, adf_vendor)
return
def _process_provider(self, prospect, adf_provider):
"""Support Method
Processes the 'provider' block"""
# Create the provider object
provider = prospect.provider = ADFProvider()
# Set the data
ADF._set_id_data(provider, adf_provider.id_list)
ADF._treat_elem(provider, adf_provider, "name")
ADF._treat_elem(provider, adf_provider, "service")
ADF._treat_elem(provider, adf_provider, "url")
ADF._treat_elem(provider, adf_provider, "email")
ADF._treat_elem(provider, adf_provider, "phone")
# Create the contact object and process it
provider.contact = ADFContact()
self._process_contact_block(provider.contact, adf_provider)
return
def _process_contact_block(self, contact, src):
"""Support Method
Processes the 'contact' block"""
# Make sure we have a contact to process
if src.__dict__.has_key("contact"):
# Transfer the base details
ADF._treat_elem(contact, src.contact)
# Transfer the email
ADF._treat_elem(contact, src.contact, "email")
# Transfer the phone (if available)
contact.phones = []
for phone in src.contact.phone_list:
block = dict(value=phone.value, _type=None, _time=None, _preferredcontact=None)
contact.phones.append(block)
ADF._treat_elem(block, phone)
# Transfer the names (if available)
contact.names = []
for name in src.contact.name_list:
block = dict(value=name.value, _part=None, _type=None)
contact.names.append(block)
ADF._treat_elem(block, name)
# Is the address defined? If so, process it
if src.contact.__dict__.has_key('address'):
# Create the address object
address = contact.address = ADFContactAddress()
# Transfer the data to the address object
ADF._treat_elem(address, src.contact.address)
ADF._treat_elem(address, src.contact.address, "apartment")
ADF._treat_elem(address, src.contact.address, "city")
ADF._treat_elem(address, src.contact.address, "regioncode")
ADF._treat_elem(address, src.contact.address, "postalcode")
ADF._treat_elem(address, src.contact.address, "country")
# Process the street details
address.streets = []
for street in src.contact.address.street_list:
block = dict(value=street.value, _line=street.attributes.get("line"))
address.streets.append(block)
ADF._treat_elem(block, street)
# Sort the street data
address.streets.sort(ADF._address_street_sort)
return
### classes.py
Code:
# -*- coding: latin-1 -*-
FIELDS_PRICE = [
"price", "price_type", "price_currency",
"price_delta", "price_relativeto", "price_source"
]
FIELDS_ID = [
"id", "id_sequence", "id_source"
]
# Objects for the XML to Python parser
class ADFNode(object):
"""Object containing details for a single XML node.
Generated by the parser (x.leads.adf.parser).
Used by the parser and the ADF object (x.leads.adf.ADF)"""
def __init__(self, attributes={}, value=None):
"""Initialize the object by storing the attributes
and value"""
self.attributes = attributes
self.value = value
def __repr__(self):
"""A textual representation of the object's contents"""
attrs = []
for k, v in self.attributes.iteritems():
try:
attrs.append("%s=%d" % ( k, int(v) ))
except:
attrs.append('%s="%s"' % ( k, v ))
if attrs:
attrs = " ".join(attrs)
else:
attrs = ""
values = []
if self.value:
values.append(str(self.value))
values = "".join(values)
if attrs:
return "'%s [%s]'" % ( values, attrs )
return "'%s'" % ( values, )
class ADFBlock(object):
"""Object containing details for a XML parent block.
This is a block that may have blocks (ADFBlock) or nodes (ADFNode)
under it.
Generated by the parser (x.leads.adf.parser).
Used by the parser and the ADF object (x.leads.adf.ADF)"""
def __init__(self, attributes):
"""Initialize the object by storing the attributes"""
self.attributes = attributes
def __repr__(self):
"""A textual representation of the object's contents"""
return str(self.__dict__)
# Objects for the Python parser
class ADFElement(object):
"""Base element of the ADF objects.
Used and generated by the ADF object (x.leads.adf.ADF)"""
def _init_fields(self, fields):
"""Initialize the fields (store as attributes to the object)"""
for field in fields:
self.__dict__[field] = None
return
def __repr__(self):
"""A textual representation of the object"""
return str(self.__dict__)
class ADFProspect(ADFElement):
"""Holds details concerning the 'prospect' block"""
def __init__(self):
self._init_fields([
"_status",
"requestdate"
] + FIELDS_ID)
class ADFVehicle(ADFElement):
"""Holds details concerning the 'vehicle' block"""
def __init__(self):
self._init_fields([
"_interest", "_status",
"year", "make", "model", "vin", "stock", "trim",
"doors", "bodystyle", "transmission",
"odometer", "odometer_status", "odometer_units",
"condition",
"imagetag", "imagetag_width", "imagetag_height", "imagetag_alttext",
"pricecomments",
"comments"
] + FIELDS_PRICE + FIELDS_ID)
class ADFCustomer(ADFElement):
"""Holds details concerning the 'customer' block"""
def __init__(self):
self._init_fields([
"comments"
] + FIELDS_ID)
class ADFContact(ADFElement):
"""Holds details concerning the 'contact' block"""
def __init__(self):
self._init_fields([
"_primarycontact",
"email", "email_preferredcontact",
])
class ADFContactAddress(ADFElement):
"""Holds details concerning the Contact's 'address' block"""
def __init__(self):
self._init_fields([
"apartment", "city", "regioncode", "postalcode", "country"
])
class ADFCustomerTimeframe(ADFElement):
def __init__(self):
self._init_fields([
"description", "earliestdate", "latestdate"
])
class ADFVendor(ADFElement):
"""Holds details concerning the 'vendor' block"""
def __init__(self):
self._init_fields([
"vendorname", "url"
] + FIELDS_ID)
class ADFProvider(ADFElement):
"""Holds details concerning the 'provider' block"""
def __init__(self):
self._init_fields([
"service", "url",
"name", "name_part", "name_type"
] + FIELDS_ID)
Posts: 1,075
Time spent in forums: 4 Weeks 1 Day 4 h 41 m 27 sec
Reputation Power: 98
ADF - Alliance Defense Fund : Defending Our First Liberty
American Dance Festival
Ár nDraíocht Féin: A Druid Fellowship
Perhaps you'd provide a minimal example that demonstrates the problem? ---Without all of us responders having to shell out forty thousand bucks to install Oracle.
Posts: 5
Time spent in forums: 53 m 52 sec
Reputation Power: 0
My Appologies for the confusion
ADF stands for: Auto-lead Data Format
An Industry Standard Data Format for the Export and Import of Automotive Customer Leads using XML
I "inherited" this abandoned project at work and it's driving me nuts
To put things simpler, you'll find at the bottom a simple straightforward driver I made for testing.
The issue is simply: if you look at the output of the raw Object print in the last line of the driver program below .... for some reason it doesn't contain the empty tags ..
I hope this clarifies things ... otherwise please feel free to ask whatever you need me to clarify