Python Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me
Go Back   Dev Shed ForumsProgramming LanguagesPython Programming
Receive the tools necessary to be the rock star of your field. Our 12-month program teaches you the evolving world of multi-channel marketing as well as the complex issues and opportunities found in the industry.

ASP Free and Iron Speed Designer are giving away $5,500+ in FREE licenses. Iron Speed's RAD CASE toolset can save up to 80% of your coding time. One free license per week, one perpetual license per month!
Download and Activate to enter!

Web development can be a daunting task, even for specialists. There is a lot of information to absorb and a lot of technologies to learn in order to manage a superior website. When trying to learn the ropes, developers need a reliable source to introduce new ideas that can be easily implemented. When working on large projects, even web veterans may run into a technology or an aspect of a technology that they are unfamiliar with.

Learn More!


Download to Enter
| Contest Rules

Tutorials | Forums

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old February 2nd, 2012, 08:10 PM
zambilo76 zambilo76 is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Feb 2012
Posts: 5 zambilo76 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 53 m 52 sec
Reputation Power: 0
XML parsing with expat empty tags

Hi,

I'm working on some previous implemented code using the expat xml parser ... which I'm new to.

My issue seems to be with parsing empty tags i.e: <postal></postal>

The parser seems to "ignore" these TAGS, as if they were not there ... AT ALL (as if these tags were not in the xml)

In my case, I need to differentiate between EMPTY TAGS (TAG EXISTS IN XML WITH NO DATA) and NON EXISTENT ones.

Can any one enlighten me on this behaviour, explain to me why it does that, and how to get around it ?

Much appreciated

Reply With Quote
  #2  
Old February 2nd, 2012, 10:07 PM
b49P23TIvg's Avatar
b49P23TIvg b49P23TIvg is offline
Contributing User
Click here for more information.
 
Join Date: Aug 2011
Posts: 1,075 b49P23TIvg User rank is Second Lieutenant (5000 - 10000 Reputation Level)b49P23TIvg User rank is Second Lieutenant (5000 - 10000 Reputation Level)b49P23TIvg User rank is Second Lieutenant (5000 - 10000 Reputation Level)b49P23TIvg User rank is Second Lieutenant (5000 - 10000 Reputation Level)b49P23TIvg User rank is Second Lieutenant (5000 - 10000 Reputation Level)b49P23TIvg User rank is Second Lieutenant (5000 - 10000 Reputation Level)b49P23TIvg User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 4 Weeks 1 Day 4 h 41 m 27 sec
Reputation Power: 98
Works correctly on my system.

An empty xml tag looks like either

<postal></postal>
or
<postal/>

And in an content-free tag you can specify attributes.
There aren't any attributes in your example.

OK, now that I've begun to catch up to you we need to find out what xml should do. Based on my 3 minute, or shorter, scan of the xml project home page I would think that expat should call your start and end handlers:

yourParser.StartElementHandler(name, attributes)
yourParser.EndElementHandler(name)

You have to assign these functions to your parser, as in the example (click this link!).

Code:
import xml.parsers.expat

# 3 handler functions
def start_element(name, attrs):
    print('Start element:', name, attrs)
def end_element(name):
    print('End element:', name)
def char_data(data):
    print('Character data:', repr(data))

p = xml.parsers.expat.ParserCreate()

p.StartElementHandler = start_element
p.EndElementHandler = end_element
p.CharacterDataHandler = char_data

p.Parse("""<?xml version="1.0"?>
<parent id="top"><child1 name="paul">Text goes here</child1>
<postal/>
<postal></postal>
<child2 name="fred">More text</child2>
</parent>""", 1)


'''
This program finds the postal tags:
    
$ python3 p.py
Start element: parent {'id': 'top'}
Start element: child1 {'name': 'paul'}
Character data: 'Text goes here'
End element: child1
Character data: '\n'
Start element: postal {}
End element: postal
Character data: '\n'
Start element: postal {}
End element: postal
Character data: '\n'
Start element: child2 {'name': 'fred'}
Character data: 'More text'
End element: child2
Character data: '\n'
End element: parent
$
'''

Reply With Quote
  #3  
Old February 3rd, 2012, 10:32 AM
zambilo76 zambilo76 is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Feb 2012
Posts: 5 zambilo76 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 53 m 52 sec
Reputation Power: 0
Thanks for your prompt response and help ...
I fixed my issue after all , it wasn't an issue within the expat module
but in class redefined inheriting form it ...

After a good night sleep and and clear mind

Reply With Quote
  #4  
Old February 6th, 2012, 09:25 AM
zambilo76 zambilo76 is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Feb 2012
Posts: 5 zambilo76 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 53 m 52 sec
Reputation Power: 0
Sorry, I'm still having issues after all ...

Can you see any reason in this code why it would skip the elements with an empty TAG ???

Code:
import sys
from xml.parsers import expat

class XMLParser:
  """

  XML to Object
  general purpose XML parser

  just subclass it and make functions for each tag you want to handle
  """

  def __init__(self, log=None):
    self.Parser = None
    self.usedcars = []
    self._tagStack = []
    self._transtype = None
    self._currentcar = None
    if log:
      self.log = log
    else:
      from x.usedcars.media import logger
      self.log = logger.Logger(sys.stdout, logger.logALL)

    return

  def StartElement(self, name, attributes):
    'SAX start element even handler'
    name = name.lower()
    #name = name.replace(':', '_')   #20070511NH  for 'xs:' styles parsing
    print ('Start element:', name, attributes)
    for k in attributes.iterkeys():
      attributes[k] = attributes[k].encode('latin-1', 'replace')  #20061018NH
    self._tagStack.append([name, attributes])
    
    name = 'self._%sStartTag'%(name,)
    #func = None
    try:
      func = eval(name)
      func(attributes)
      #print 'called %s()'%(name,)
    except (NameError, AttributeError):
      pass
    except Exception, e:
      self.log.W('%s StartTag err:"%s"'%(name, str(e)))

    return

  def EndElement(self, name):
    'SAX end element event handler'
    name = name.lower()
    name = name.replace(':', '_')  #20070511NH  for 'xs:' styles parsing
    #name, attributes = self._tagStack[-1]

    name = 'self._%sEndTag'%(name,)
    func = None
    try:
      func = eval(name)
      func(attributes)
      #print 'called %s()'%(name,)
    except (NameError, AttributeError):
      pass
    except Exception, e:
      self.log.W('%s EndTag err:"%s"'%(name, str(e)))

    self._tagStack = self._tagStack[:-1]

    return

  def CharacterData(self, cdata):
    'SAX character data event handler'
    print 'char', repr(cdata)
    
    print '-' * 80
    print 1000
    parentname = None
    if len(self._tagStack) >= 2:
      parentname = self._tagStack[-2][0]

    name, attributes = self._tagStack[-1]    

    #sif (cdata == '') or (cdata == "") or (cdata is None) or (cdata== 0) :  cdata =u"++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++CDATA EMPTY\n"
    #cdata = (cdata or "").encode('latin-1', 'replace')
    #print 'cdata: %s' %cdata
    #print type(cdata) 
    
    print 1111, name
    print (type(cdata), cdata)
    print parentname
    print 2222
    #print 'cdata: %s' %cdata
    if parentname:
      funcname = 'self._%s_%sData'%(parentname.lower(), name.lower(),)
      print funcname
      try:
        func = eval(funcname)
        func(attributes, cdata)
        print 'called %s()'%(funcname,)
      except (NameError, AttributeError):
        parentname = None
      except Exception, e:
        self.log.W('%s err:"%s", data:"%s"'%(funcname, str(e), repr(cdata)))
        parentname = None

    print 3333
    
    if not parentname:
      funcname = 'self._%sData'%(name.lower(),)
      print funcname
      try:
        func = eval(funcname)
        func(attributes, cdata)
        #print 'called %s()'%(funcname,)
      except (NameError, AttributeError):
        pass
        #self.log.W('%s not parsed, data:"%s"'%(funcname, repr(cdata)))
      except Exception, e:
        try:
          self.log.W('%s err:"%s" data:"%s"'%(funcname, str(e), repr(cdata)))
        except UnicodeDecodeError:
          self.log.W('%s err:"%s"'%(funcname, str(e)))
        pass

    print 4444

    return

  def ParseFile(self,file):
    # Create a SAX parser
    self.Parser = expat.ParserCreate()

    #self.Parser.buffer_text = 1

    # SAX event handlers
    self.Parser.StartElementHandler = self.StartElement
    self.Parser.EndElementHandler = self.EndElement
    self.Parser.CharacterDataHandler = self.CharacterData


    # Parse the XML File
    ParserStatus = self.Parser.ParseFile(file) #Parser.Parse(open(filename,'r').read(), 1)

    return #self.root

Reply With Quote
  #5  
Old February 6th, 2012, 12:21 PM
b49P23TIvg's Avatar
b49P23TIvg b49P23TIvg is offline
Contributing User
Click here for more information.
 
Join Date: Aug 2011
Posts: 1,075 b49P23TIvg User rank is Second Lieutenant (5000 - 10000 Reputation Level)b49P23TIvg User rank is Second Lieutenant (5000 - 10000 Reputation Level)b49P23TIvg User rank is Second Lieutenant (5000 - 10000 Reputation Level)b49P23TIvg User rank is Second Lieutenant (5000 - 10000 Reputation Level)b49P23TIvg User rank is Second Lieutenant (5000 - 10000 Reputation Level)b49P23TIvg User rank is Second Lieutenant (5000 - 10000 Reputation Level)b49P23TIvg User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 4 Weeks 1 Day 4 h 41 m 27 sec
Reputation Power: 98
Works. Cannot duplicate your trouble.

Hello! It does find the empty tags. I made the seemingly passive changes to your code shown here:

Code:
...
    name = 'self._%sEndTag'%(name,)
    func = None
    print('DWL: END ELEMENT '+name)  #inserted this line into EndElement
    try:
      func = eval(name)

...

# append to the end of your file which I named XML.py

with open('example.xml','w') as f:
    f.write('''<?xml version='1.0'?>
<parent id='top'><child1 name='paul'>Text goes here</child1>
<postal/>
<postal></postal>
<child2 name='fred'>More text</child2>
</parent>
''')

import logging
p = XMLParser(logging.Logger('dwl'))
p.ParseFile(open('example.xml','r'))


And then evaluate XML.py , showing the 2 postal start and 2 postal end tags are found.
Code:
$ python XML.py
('Start element:', u'parent', {u'id': u'top'})
('Start element:', u'child1', {u'name': u'paul'})
char u'Text goes here'
--------------------------------------------------------------------------------
1000
1111 child1
(<type 'unicode'>, u'Text goes here')
parent
2222
self._parent_child1Data
3333
self._child1Data
4444
DWL: END ELEMENT self._child1EndTag
char u'\n'
--------------------------------------------------------------------------------
1000
1111 parent
(<type 'unicode'>, u'\n')
None
2222
3333
self._parentData
4444
('Start element:', u'postal', {})
DWL: END ELEMENT self._postalEndTag
char u'\n'
--------------------------------------------------------------------------------
1000
1111 parent
(<type 'unicode'>, u'\n')
None
2222
3333
self._parentData
4444
('Start element:', u'postal', {})
DWL: END ELEMENT self._postalEndTag
char u'\n'
--------------------------------------------------------------------------------
1000
1111 parent
(<type 'unicode'>, u'\n')
None
2222
3333
self._parentData
4444
('Start element:', u'child2', {u'name': u'fred'})
char u'More text'
--------------------------------------------------------------------------------
1000
1111 child2
(<type 'unicode'>, u'More text')
parent
2222
self._parent_child2Data
3333
self._child2Data
4444
DWL: END ELEMENT self._child2EndTag
char u'\n'
--------------------------------------------------------------------------------
1000
1111 parent
(<type 'unicode'>, u'\n')
None
2222
3333
self._parentData
4444
DWL: END ELEMENT self._parentEndTag

Reply With Quote
  #6  
Old February 6th, 2012, 01:08 PM
zambilo76 zambilo76 is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Feb 2012
Posts: 5 zambilo76 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 53 m 52 sec
Reputation Power: 0
Your help is much appreciated ...

Seems right but for some reason my raw output is still missing the ones with the empty tags ...

May be u can take a look at the following ADF files and see if anything seems Obvious ?

## adf.py class

Code:
# -*- coding: latin-1 -*-

"""
ADF Object used to represent an ADF XML file
Uses ADFParser to parse the XML file (x.leads.adf.parser).
Uses ADFGenerator to output itself as an ADF XML (from x.leads.adf.generator)
"""

from x.leads.adf.classes import ADFNode, ADFBlock
from x.leads.adf.classes import ADFProspect, ADFVehicle, ADFCustomer
from x.leads.adf.classes import ADFVendor, ADFProvider
from x.leads.adf.classes import ADFContact, ADFContactAddress
from x.leads.adf.classes import ADFCustomerTimeframe
from x.leads.adf.classes import FIELDS_PRICE
from x.leads.adf.generator import ADFGenerator
from x.leads.adf.parser import ADFParser

class ADFValidationError(Exception):
  pass

class ADF(object):
  """ADF Object containing objects for the following nodes:
  Prospect, Vehicle, Customer, Vendor, Provider"""
  
  def __init__(self, fp):
    """Initialize the object.
    fp is a file-like object that has a read() method to consume
    the contents."""
    
    # Initialize the most basic portion of the object
    self.raw_object = None
    self.prospects = []
    
    if fp:
      self._parse(fp)
  
  def to_xml(self):
    """Convert the ADF object to XML.
    Returns XML (string)"""
    
    return ADFGenerator(self).to_xml()
  
  def to_dict(self, prospectidx=0):
    """Returns a dictionary representation of the ADF object.
    This is used strictly for webservices, so the representation
    is based on the requirements of the webservice."""
    
    return ADFGenerator(self).to_dict(prospectidx)
  
  @classmethod
  def _set_id_data(cls, obj, item):
    """Support Method
    Set the data related to the ID node.
    
    item must be an ADFNode object, or a list containing
    ADFNode objects.
    """
    
    # Only process the first id if we have a list
    # This may need fixing in the future as we may need to
    # better target the id we're trying to set
    if isinstance(item, list) and item:
      item = item[0]
    
    # Make sure this is an ADFNode
    if isinstance(item, ADFNode):
      obj.id = item.value
      obj.id_sequence = item.attributes.get("sequence")
      obj.id_source = item.attributes.get("source")
    
    return
  
  @classmethod
  def _treat_elem(self, dst, src, property=None):
    """Support Method
    Transfers data from the src object to the destination object.
    The property value identifies which specific property we are
    looking to transfer.
    
    The src object may be an ADFNode object or a dictionary"""
    
    # Is dst a dictionary?
    if isinstance(dst, dict):
      # Yes
      dest = dst
    else:
      # No, obtain the object's __dict__
      # An exception will bubble up if this is not an Object
      # with __dict__ 
      dest = dst.__dict__
  
    if not property:
      # The property was not specified.
      # This means that we must work on the src object itself
      # The src object must be an ADFNode or ADFBlock object
      if isinstance(src, (ADFNode, ADFBlock)):
        # Transfer the attributes from src to dest
        for k, v in src.attributes.iteritems():
          key = "_" + k
          if isinstance(key, unicode):
            key = key.encode("latin-1")
          dest[key] = v
    else:
      # The property was specified.
      # This means that we need to obtain the node (property)
      # from the src object
      node = src.__dict__.get(property)
      
      # Validate that this is an ADFNode object
      if isinstance(node, ADFNode):
        dest[property] = node.value
        
        # Transfer the attributes from src to dest
        for k, v in node.attributes.iteritems():
          key = property + "_" + k
          if isinstance(key, unicode):
            key = key.encode("latin-1")
          dest[key] = v
    
    return
  
  def _parse(self, fp):
    """Support Method
    Reads the input file and parses the XML"""
    
    # Parse the incoming data (fp)
    self.raw_object = adf = ADFParser()
    adf.ParseFile(fp)
    
    # Validate the parsed data
    try:
      #adf._validate_adf()
      pass
    except Exception, e:
      raise ADFValidationError(str(e))
    
    # Process the prospects (first level)
    self._process_prospects(adf)
    
    return
  
  def _process_prospects(self, adf):
    """Support Method
    Processes the 'prospect' block"""
    
    # Run through the data and parse the elements we need to build our object's properties
    for adf_prospect in adf.prospect_list:
      if isinstance(adf_prospect, ADFBlock):
        prospect = ADFProspect()
        self.prospects.append(prospect)
        
        ADF._set_id_data(prospect, adf_prospect.id_list)
        ADF._treat_elem(prospect, adf_prospect)
        ADF._treat_elem(prospect, adf_prospect, "requestdate")
        
        # Process the Vehicles
        self._process_vehicles(prospect, adf_prospect.vehicle_list)
        
        # Process the vendor
        if 'vendor' in adf_prospect.__dict__:
          self._process_vendor(prospect, adf_prospect.vendor)
        
        # Process the provider
        if 'provider' in adf_prospect.__dict__:
          self._process_provider(prospect, adf_prospect.provider)
        
        # Process the customer
        if 'customer' in adf_prospect.__dict__:
          self._process_customer(prospect, adf_prospect.customer)
    
    return
  
  def _process_vehicles(self, prospect, vehicle_list):
    """Support Method
    Processes the 'vehicle' block"""
    
    # This prospect begins with an empty list of vehicles
    prospect.vehicles = []
    
    for adf_vehicle in vehicle_list:
      vehicle = ADFVehicle()
      prospect.vehicles.append(vehicle)
      
      # Set the vehicle data
      ADF._set_id_data(vehicle, adf_vehicle.id_list)
      ADF._treat_elem(vehicle, adf_vehicle)
      ADF._treat_elem(vehicle, adf_vehicle, "year")
      ADF._treat_elem(vehicle, adf_vehicle, "make")
      ADF._treat_elem(vehicle, adf_vehicle, "model")
      ADF._treat_elem(vehicle, adf_vehicle, "vin")
      ADF._treat_elem(vehicle, adf_vehicle, "stock")
      ADF._treat_elem(vehicle, adf_vehicle, "trim")
      ADF._treat_elem(vehicle, adf_vehicle, "doors")
      ADF._treat_elem(vehicle, adf_vehicle, "bodystyle")
      ADF._treat_elem(vehicle, adf_vehicle, "transmission")
      ADF._treat_elem(vehicle, adf_vehicle, "odometer")
      ADF._treat_elem(vehicle, adf_vehicle, "condition")
      ADF._treat_elem(vehicle, adf_vehicle, "imagetag")
      ADF._treat_elem(vehicle, adf_vehicle, "price")
      ADF._treat_elem(vehicle, adf_vehicle, "pricecomments")
      ADF._treat_elem(vehicle, adf_vehicle, "comments")
      
      # Process colorcombinations
      self._process_vehicle_colorcombinations(vehicle, adf_vehicle.colorcombination_list)
      
      # Process options
      self._process_vehicle_options(vehicle, adf_vehicle.option_list)
      
      # Process finance
      finance = adf_vehicle.__dict__.get("finance")
      if finance:
        self._process_vehicle_finance(vehicle, adf_vehicle.finance)
    
    return
  
  def _process_vehicle_colorcombinations(self, vehicle, colorcombinations):
    """Support Method
    Processes the 'colorcombination' block"""
    
    # This vehicle is assigned a color combination list
    vehicle.colorcombinations = []
    
    for combination in colorcombinations:
      if isinstance(combination, ADFBlock):
        block = {}
        fields = ["preference", "interiorcolor", "exteriorcolor"]
        
        for field in fields:
          block[field] = None
          
        for field in fields:
          ADF._treat_elem(block, combination, field)
        
        # Add it to the combinations
        vehicle.colorcombinations.append(block)
    
    # Sort the combinations
    vehicle.colorcombinations.sort(ADF._process_vehicle_colorcombinations_sort)
    
    return
  
  @classmethod
  def _generic_int_sort(cls, a, b, field, reversed=False):
    """Support Method
    Generic method to sort based on an int property (field) that
    belongs to a and b. The results may be sorted in reverse
    using reversed=True"""
    
    # Obtain the ints to sort by
    a_val = int(a.get(field, 0))
    b_val = int(b.get(field, 0))
    
    # Does the user want to reverse the order?
    if reversed:
      return b_val - a_val
    
    return a_val - b_val
  
  @classmethod
  def _process_vehicle_colorcombinations_sort(cls, a, b):
    """Support Method
    Sort method for the color combinations"""
    
    # color combinations are sorted via the "preference" field.
    # This is a required field so we know it will exist
    return cls._generic_int_sort(a, b, "preference")
  
  def _process_vehicle_options(self, vehicle, options):
    """Support Method
    Processes the vehicle's 'option' block"""
    
    # This vehicle is assigned an options list
    vehicle.options = []
    
    for option in options:
      if isinstance(option, ADFBlock):
        block = {}
        fields = ["optionname", "manufacturercode", "stock", "weighting"] + FIELDS_PRICE
        
        for field in fields:
          block[field] = None
          
        for field in fields:
          ADF._treat_elem(block, option, field)
        
        vehicle.options.append(block)
    
    # Sort the options
    vehicle.options.sort(ADF._process_vehicle_options_sort)
    
    return
  
  @classmethod
  def _process_vehicle_options_sort(cls, a, b):
    """Support Method
    Sort method for the vehicle options"""
    
    # The options are ordered using the weighting field
    return cls._generic_int_sort(a, b, "weighting", reversed=True)
  
  def _process_vehicle_finance(self, vehicle, finance):
    """Support Method
    Processes the vehicle`s 'finance' block"""
    
    # Create a new amount list and transfer the finance data to the block
    amount_list = []
    block = dict(amounts=amount_list)
    ADF._treat_elem(block, finance)
    
    # Transfer the data from the 2 possible child options
    fields = ["method", "balance"]
    for field in fields:
      block[field] = None
    
    for field in fields:
      ADF._treat_elem(block, finance, field)
    
    # Transfer all the amounts (ADF has 1 or more)
    for amount in finance.amount_list:
      amt_block = dict(value=amount.value)
      ADF._treat_elem(amt_block, amount)
      amount_list.append(amt_block)
    
    # Set the vehicle's finance block
    vehicle.finance = block
    
    return
  
  def _process_customer(self, prospect, adf_customer):
    """Support Method
    Processes the 'customer' block"""
    
    # Create the customer object
    customer = prospect.customer = ADFCustomer()
    
    # Transfer the data to the object
    ADF._set_id_data(customer, adf_customer.id_list)
    ADF._treat_elem(customer, adf_customer)
    ADF._treat_elem(customer, adf_customer, "comments")
    
    # Create the contact object
    customer.contact = ADFContact()
    self._process_contact_block(customer.contact, adf_customer)
    
    # Is the timeframe block defined?
    if adf_customer.__dict__.has_key('timeframe'):
      # Create the timeframe object
      timeframe = customer.timeframe = ADFCustomerTimeframe()
      
      # Transfer the timeframe information
      ADF._treat_elem(timeframe, adf_customer.timeframe)
      ADF._treat_elem(timeframe, adf_customer.timeframe, "description")
      ADF._treat_elem(timeframe, adf_customer.timeframe, "earliestdate")
      ADF._treat_elem(timeframe, adf_customer.timeframe, "latestdate")
    
    return
  
  @classmethod
  def _address_street_sort(cls, a, b):
    """Support Method
    Sort method for the address' 'street' tag"""
    
    # The street data is sorted using the "line" attribute
    return cls._generic_int_sort(a, b, "_line")
  
  def _process_vendor(self, prospect, adf_vendor):
    """Support Method
    Processes the 'vendor' block"""
    
    # Create a new vendor object
    vendor = prospect.vendor = ADFVendor()
    
    # Set the data
    ADF._set_id_data(vendor, adf_vendor.id_list)
    ADF._treat_elem(vendor, adf_vendor, "vendorname")
    ADF._treat_elem(vendor, adf_vendor, "url")
    
    # Create the contact object and process it
    vendor.contact = ADFContact()
    self._process_contact_block(vendor.contact, adf_vendor)
    
    return
  
  def _process_provider(self, prospect, adf_provider):
    """Support Method
    Processes the 'provider' block"""
    
    # Create the provider object
    provider = prospect.provider = ADFProvider()
    
    # Set the data
    ADF._set_id_data(provider, adf_provider.id_list)
    ADF._treat_elem(provider, adf_provider, "name")
    ADF._treat_elem(provider, adf_provider, "service")
    ADF._treat_elem(provider, adf_provider, "url")
    ADF._treat_elem(provider, adf_provider, "email")
    ADF._treat_elem(provider, adf_provider, "phone")
    
    # Create the contact object and process it
    provider.contact = ADFContact()
    self._process_contact_block(provider.contact, adf_provider)
    
    return
  
  def _process_contact_block(self, contact, src):
    """Support Method
    Processes the 'contact' block"""
    
    # Make sure we have a contact to process
    if src.__dict__.has_key("contact"):
      # Transfer the base details
      ADF._treat_elem(contact, src.contact)
      
      # Transfer the email
      ADF._treat_elem(contact, src.contact, "email")
      
      # Transfer the phone (if available)
      contact.phones = []
      for phone in src.contact.phone_list:
        block = dict(value=phone.value, _type=None, _time=None, _preferredcontact=None)
        contact.phones.append(block)
        ADF._treat_elem(block, phone)
      
      # Transfer the names (if available)
      contact.names = []
      for name in src.contact.name_list:
        block = dict(value=name.value, _part=None, _type=None)
        contact.names.append(block)
        ADF._treat_elem(block, name)
      
      # Is the address defined? If so, process it
      if src.contact.__dict__.has_key('address'):
        # Create the address object
        address = contact.address = ADFContactAddress()
        
        # Transfer the data to the address object
        ADF._treat_elem(address, src.contact.address)
        ADF._treat_elem(address, src.contact.address, "apartment")
        ADF._treat_elem(address, src.contact.address, "city")
        ADF._treat_elem(address, src.contact.address, "regioncode")
        ADF._treat_elem(address, src.contact.address, "postalcode")
        ADF._treat_elem(address, src.contact.address, "country")
      
        # Process the street details
        address.streets = []
        for street in src.contact.address.street_list:
          block = dict(value=street.value, _line=street.attributes.get("line"))
          address.streets.append(block)
          ADF._treat_elem(block, street)
        
        # Sort the street data
        address.streets.sort(ADF._address_street_sort)
    return


### classes.py
Code:
# -*- coding: latin-1 -*-

FIELDS_PRICE = [
  "price", "price_type", "price_currency",
  "price_delta", "price_relativeto", "price_source"
]

FIELDS_ID = [
  "id", "id_sequence", "id_source"
]

# Objects for the XML to Python parser
class ADFNode(object):
  """Object containing details for a single XML node.
  Generated by the parser (x.leads.adf.parser).
  Used by the parser and the ADF object (x.leads.adf.ADF)"""
  
  def __init__(self, attributes={}, value=None):
    """Initialize the object by storing the attributes
    and value"""
    
    self.attributes = attributes
    self.value = value
  
  def __repr__(self):
    """A textual representation of the object's contents"""
    
    attrs = []
    
    for k, v in self.attributes.iteritems():
      try:
        attrs.append("%s=%d" % ( k, int(v) ))
      except:
        attrs.append('%s="%s"' % ( k, v ))
    
    if attrs:
      attrs = " ".join(attrs)
    else:
      attrs = ""
    
    values = []
    if self.value:
      values.append(str(self.value))
    
    values = "".join(values)
    if attrs:
      return "'%s [%s]'" % ( values, attrs )
    
    return "'%s'" % ( values, )

class ADFBlock(object):
  """Object containing details for a XML parent block.
  This is a block that may have blocks (ADFBlock) or nodes (ADFNode)
  under it.
  Generated by the parser (x.leads.adf.parser).
  Used by the parser and the ADF object (x.leads.adf.ADF)"""
  
  def __init__(self, attributes):
    """Initialize the object by storing the attributes"""
    
    self.attributes = attributes
  
  def __repr__(self):
    """A textual representation of the object's contents"""
    
    return str(self.__dict__)

# Objects for the Python parser

class ADFElement(object):
  """Base element of the ADF objects.
  Used and generated by the ADF object (x.leads.adf.ADF)"""
  
  def _init_fields(self, fields):
    """Initialize the fields (store as attributes to the object)"""
    
    for field in fields:
      self.__dict__[field] = None
    return
  
  def __repr__(self):
    """A textual representation of the object"""
    return str(self.__dict__)

class ADFProspect(ADFElement):
  """Holds details concerning the 'prospect' block"""
  
  def __init__(self):
    self._init_fields([
      "_status",
      "requestdate"
    ] + FIELDS_ID)

class ADFVehicle(ADFElement):
  """Holds details concerning the 'vehicle' block"""
  
  def __init__(self):
    self._init_fields([
      "_interest", "_status",
      "year", "make", "model", "vin", "stock", "trim",
      "doors", "bodystyle", "transmission",
      "odometer", "odometer_status", "odometer_units",
      "condition",
      "imagetag", "imagetag_width", "imagetag_height", "imagetag_alttext",
      "pricecomments",
      "comments"
    ] + FIELDS_PRICE + FIELDS_ID)

class ADFCustomer(ADFElement):
  """Holds details concerning the 'customer' block"""
  
  def __init__(self):
    self._init_fields([
      "comments"
    ] + FIELDS_ID)

class ADFContact(ADFElement):
  """Holds details concerning the 'contact' block"""
  
  def __init__(self):
    self._init_fields([
      "_primarycontact",
      "email", "email_preferredcontact",
    ])

class ADFContactAddress(ADFElement):
  """Holds details concerning the Contact's 'address' block"""
  
  def __init__(self):
    self._init_fields([
      "apartment", "city", "regioncode", "postalcode", "country"
    ])

class ADFCustomerTimeframe(ADFElement):
  def __init__(self):
    self._init_fields([
      "description", "earliestdate", "latestdate"
    ])

class ADFVendor(ADFElement):
  """Holds details concerning the 'vendor' block"""
  
  def __init__(self):
    self._init_fields([
      "vendorname", "url"
    ] + FIELDS_ID)
    
class ADFProvider(ADFElement):
  """Holds details concerning the 'provider' block"""
  
  def __init__(self):
    self._init_fields([
      "service", "url",
      "name", "name_part", "name_type"
    ] + FIELDS_ID)


###

Reply With Quote
  #7  
Old February 6th, 2012, 01:44 PM
b49P23TIvg's Avatar
b49P23TIvg b49P23TIvg is offline
Contributing User
Click here for more information.
 
Join Date: Aug 2011
Posts: 1,075 b49P23TIvg User rank is Second Lieutenant (5000 - 10000 Reputation Level)b49P23TIvg User rank is Second Lieutenant (5000 - 10000 Reputation Level)b49P23TIvg User rank is Second Lieutenant (5000 - 10000 Reputation Level)b49P23TIvg User rank is Second Lieutenant (5000 - 10000 Reputation Level)b49P23TIvg User rank is Second Lieutenant (5000 - 10000 Reputation Level)b49P23TIvg User rank is Second Lieutenant (5000 - 10000 Reputation Level)b49P23TIvg User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 4 Weeks 1 Day 4 h 41 m 27 sec
Reputation Power: 98
ADF - Alliance Defense Fund : Defending Our First Liberty

American Dance Festival

Ár nDraíocht Féin: A Druid Fellowship



Perhaps you'd provide a minimal example that demonstrates the problem? ---Without all of us responders having to shell out forty thousand bucks to install Oracle.

Reply With Quote
  #8  
Old February 6th, 2012, 03:41 PM
zambilo76 zambilo76 is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Feb 2012
Posts: 5 zambilo76 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 53 m 52 sec
Reputation Power: 0
My Appologies for the confusion
ADF stands for: Auto-lead Data Format
An Industry Standard Data Format for the Export and Import of Automotive Customer Leads using XML

I "inherited" this abandoned project at work and it's driving me nuts

To put things simpler, you'll find at the bottom a simple straightforward driver I made for testing.

The issue is simply: if you look at the output of the raw Object print in the last line of the driver program below .... for some reason it doesn't contain the empty tags ..

I hope this clarifies things ... otherwise please feel free to ask whatever you need me to clarify

Cheers,

Code:
from pprint import pprint
import StringIO
from x.leads.adf import ADF

if __name__ == "__main__":
  adf_text = StringIO.StringIO("""<?xml version="1.0"?>
  <adf>
    <prospect status="new">
      <id sequence="1" source="MUSA">100001642</id>
      <requestdate timezone="gmt">23 Jan 2012 19:58:19 GMT</requestdate>
      <vehicle status="" interest="buy" kelounne="tata">
        <year></year>
        <make>Mazda</make>
        <model>CX-7</model>
        <vin>1234567899</vin>
        <trim>CX-7 Sport FWD</trim>
        <transmission>A</transmission>
        <odometer unit="km">96450</odometer>
        <colorcombination>
          <interiorcolor>BLACK</interiorcolor>
          <exteriorcolor>TRUE SILVER METALLIC</exteriorcolor>
          <preference>1</preference>
        </colorcombination>
        
        <option>
          <optionname>Sport</optionname>
          <manufacturercode>p394</manufacturercode>
          <weighting>65</weighting>
        </option>
        <option>
          <optionname>Keyless Entry</optionname>
          <manufacturercode>p395</manufacturercode>
          <weighting>100</weighting>
        </option>
        
        <price type="asking" currency="USD">$17,998</price>
        <finance>
          <method>Finance</method>
          <amount type="downpayment" currency="USD">5000</amount>
          <amount type="monthly" currency="USD">1000</amount>
          <amount type="total" currency="USD">50000</amount>
          <balance type="residual" currency="USD">2000</balance>
        </finance>
      </vehicle>
      <customer>
        <timeframe>
          <description>Within 1 month</description>
        </timeframe>
        <contact>
          <name part="first">john</name>
          <name part="last">bruning</name>
          <email preferredcontact="1">john.bruning@icrossing.com</email>
          <phone preferredcontact="0" time="day">(345) 345-3453</phone>
          <address>
            <street line="1">120 King Street</street>
            <street line="2">Suite 310</street>
            <city>Laval</city>
            <regioncode>Quebec</regioncode>
            <postalcode>HHH555</postalcode>
          </address>
        </contact>
        <comments>Great</comments>
      </customer>
      <vendor>
        <id>42159</id>
        <vendorname>MAZDA OF PUENTE HILLS</vendorname>
        <contact>
          <name>mon nom</name>
          <phone>12345566</phone>
        </contact>
      </vendor>
      <provider>
        <name>MazdaCPO</name>
        <service>A0MVLocator</service>
      </provider>
    </prospect>
  </adf>
  """)

myADF = ADF(adf_text)
request_dict = myADF.to_dict()
pprint(request_dict, indent=2)
#print myADF.to_xml()
print (myADF.raw_object.__dict__)

Reply With Quote
  #9  
Old February 6th, 2012, 04:41 PM
b49P23TIvg's Avatar
b49P23TIvg b49P23TIvg is offline
Contributing User
Click here for more information.
 
Join Date: Aug 2011
Posts: 1,075 b49P23TIvg User rank is Second Lieutenant (5000 - 10000 Reputation Level)b49P23TIvg User rank is Second Lieutenant (5000 - 10000 Reputation Level)b49P23TIvg User rank is Second Lieutenant (5000 - 10000 Reputation Level)b49P23TIvg User rank is Second Lieutenant (5000 - 10000 Reputation Level)b49P23TIvg User rank is Second Lieutenant (5000 - 10000 Reputation Level)b49P23TIvg User rank is Second Lieutenant (5000 - 10000 Reputation Level)b49P23TIvg User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 4 Weeks 1 Day 4 h 41 m 27 sec
Reputation Power: 98
I haven't got ADF . I don't have a file name x.py . Look, given your data and the program you supplied:

1) The year tag is the only tag I see without data.
2) The program finds the start and end of the year tag.

Run this with recent installation of python 2: Maybe you have a bad version of expat .

Code:
import sys
import StringIO
from xml.parsers import expat

class XMLParser:
  """

  XML to Object
  general purpose XML parser

  just subclass it and make functions for each tag you want to handle
  """

  def __init__(self, log=None):
    self.Parser = None
    self.usedcars = []
    self._tagStack = []
    self._transtype = None
    self._currentcar = None
    if log:
      self.log = log
    else:
      from x.usedcars.media import logger
      self.log = logger.Logger(sys.stdout, logger.logALL)

    return

  def StartElement(self, name, attributes):
    'SAX start element even handler'
    name = name.lower()
    #name = name.replace(':', '_')   #20070511NH  for 'xs:' styles parsing
    print ('Start element:', name, attributes)
    for k in attributes.iterkeys():
      attributes[k] = attributes[k].encode('latin-1', 'replace')  #20061018NH
    self._tagStack.append([name, attributes])
    
    name = 'self._%sStartTag'%(name,)
    #func = None
    try:
      func = eval(name)
      func(attributes)
      #print 'called %s()'%(name,)
    except (NameError, AttributeError):
      pass
    except Exception, e:
      self.log.W('%s StartTag err:"%s"'%(name, str(e)))

    return

  def EndElement(self, name):
    'SAX end element event handler'
    name = name.lower()
    name = name.replace(':', '_')  #20070511NH  for 'xs:' styles parsing
    #name, attributes = self._tagStack[-1]

    name = 'self._%sEndTag'%(name,)
    func = None
    print('DWL: END ELEMENT '+name)
    try:
      func = eval(name)
      func(attributes)
      #print 'called %s()'%(name,)
    except (NameError, AttributeError):
      pass
    except Exception, e:
      self.log.W('%s EndTag err:"%s"'%(name, str(e)))

    self._tagStack = self._tagStack[:-1]

    return

  def CharacterData(self, cdata):
    'SAX character data event handler'
    print 'char', repr(cdata)
    
    print '-' * 80
    print 1000
    parentname = None
    if len(self._tagStack) >= 2:
      parentname = self._tagStack[-2][0]

    name, attributes = self._tagStack[-1]    

    #sif (cdata == '') or (cdata == "") or (cdata is None) or (cdata== 0) :  cdata =u"++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++CDATA EMPTY\n"
    #cdata = (cdata or "").encode('latin-1', 'replace')
    #print 'cdata: %s' %cdata
    #print type(cdata) 
    
    print 1111, name
    print (type(cdata), cdata)
    print parentname
    print 2222
    #print 'cdata: %s' %cdata
    if parentname:
      funcname = 'self._%s_%sData'%(parentname.lower(), name.lower(),)
      print funcname
      try:
        func = eval(funcname)
        func(attributes, cdata)
        print 'called %s()'%(funcname,)
      except (NameError, AttributeError):
        parentname = None
      except Exception, e:
        self.log.W('%s err:"%s", data:"%s"'%(funcname, str(e), repr(cdata)))
        parentname = None

    print 3333
    
    if not parentname:
      funcname = 'self._%sData'%(name.lower(),)
      print funcname
      try:
        func = eval(funcname)
        func(attributes, cdata)
        #print 'called %s()'%(funcname,)
      except (NameError, AttributeError):
        pass
        #self.log.W('%s not parsed, data:"%s"'%(funcname, repr(cdata)))
      except Exception, e:
        try:
          self.log.W('%s err:"%s" data:"%s"'%(funcname, str(e), repr(cdata)))
        except UnicodeDecodeError:
          self.log.W('%s err:"%s"'%(funcname, str(e)))
        pass

    print 4444

    return

  def ParseFile(self,file):
    # Create a SAX parser
    self.Parser = expat.ParserCreate()

    #self.Parser.buffer_text = 1

    # SAX event handlers
    self.Parser.StartElementHandler = self.StartElement
    self.Parser.EndElementHandler = self.EndElement
    self.Parser.CharacterDataHandler = self.CharacterData


    # Parse the XML File
    ParserStatus = self.Parser.ParseFile(file) #Parser.Parse(open(filename,'r').read(), 1)

    return #self.root

adf_text = StringIO.StringIO("""<?xml version="1.0"?>
  <adf>
    <prospect status="new">
      <id sequence="1" source="MUSA">100001642</id>
      <requestdate timezone="gmt">23 Jan 2012 19:58:19 GMT</requestdate>
      <vehicle status="" interest="buy" kelounne="tata">
        <year></year>
        <make>Mazda</make>
        <model>CX-7</model>
        <vin>1234567899</vin>
        <trim>CX-7 Sport FWD</trim>
        <transmission>A</transmission>
        <odometer unit="km">96450</odometer>
        <colorcombination>
          <interiorcolor>BLACK</interiorcolor>
          <exteriorcolor>TRUE SILVER METALLIC</exteriorcolor>
          <preference>1</preference>
        </colorcombination>
        
        <option>
          <optionname>Sport</optionname>
          <manufacturercode>p394</manufacturercode>
          <weighting>65</weighting>
        </option>
        <option>
          <optionname>Keyless Entry</optionname>
          <manufacturercode>p395</manufacturercode>
          <weighting>100</weighting>
        </option>
        
        <price type="asking" currency="USD">$17,998</price>
        <finance>
          <method>Finance</method>
          <amount type="downpayment" currency="USD">5000</amount>
          <amount type="monthly" currency="USD">1000</amount>
          <amount type="total" currency="USD">50000</amount>
          <balance type="residual" currency="USD">2000</balance>
        </finance>
      </vehicle>
      <customer>
        <timeframe>
          <description>Within 1 month</description>
        </timeframe>
        <contact>
          <name part="first">john</name>
          <name part="last">bruning</name>
          <email preferredcontact="1">john.bruning@icrossing.com</email>
          <phone preferredcontact="0" time="day">(345) 345-3453</phone>
          <address>
            <street line="1">120 King Street</street>
            <street line="2">Suite 310</street>
            <city>Laval</city>
            <regioncode>Quebec</regioncode>
            <postalcode>HHH555</postalcode>
          </address>
        </contact>
        <comments>Great</comments>
      </customer>
      <vendor>
        <id>42159</id>
        <vendorname>MAZDA OF PUENTE HILLS</vendorname>
        <contact>
          <name>mon nom</name>
          <phone>12345566</phone>
        </contact>
      </vendor>
      <provider>
        <name>MazdaCPO</name>
        <service>A0MVLocator</service>
      </provider>
    </prospect>
  </adf>
""")

import logging
p = XMLParser(logging.Logger('dwl'))
p.ParseFile(adf_text)


But frankly, I'm about done with this given that you're paid and I'm jobless.

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming LanguagesPython Programming > XML parsing with expat empty tags


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 


Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.

© 2003-2012 by Developer Shed. All rights reserved. DS Cluster 11 - Follow our Sitemap