Thread: Xml parsing

    #1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Posts
    27
    Rep Power
    0

    Xml parsing


    So I need to parse an xml file using the ElementTree module and load all the data into a list of tuples which should look something like this:

    Code:
    [(1, 'Customer#000000001', 'IVhzIApeRb ot,c,E', 15, '25-989-741-2988', 711.56, 'BUILDING', 'regular, regular platelets are fluffily according to the even attainments. blithely iron'), (2, 'Customer#000000002', 'XSTf4,NCwDVaWNe6tEgvwfmRchLXak', 13, '23-768-687-3665', 121.65, 'AUTOMOBILE', 'furiously special deposits solve slyly. furiously even foxes wake alongside of the furiously ironic ideas. pending'), ...]
    Right now, I can only get it to output as this:
    Code:
    1 
    Customer#000000001
    IVhzIApeRb ot,c,E
    15
    25-989-741-2988
    711.56
    BUILDING
    regular
    regular platelets are fluffily according to the even attainments. blithely iron
    2
    Customer#000000002
    XSTf4,NCwDVaWNe6tEgvwfmRchLXak
    13
    23-768-687-3665
    121.65
    AUTOMOBILE
    furiously special deposits solve slyly. furiously even foxes wake alongside of the furiously ironic ideas. pending
    This is my code, I'm not sure if I am suppose to use append or not because when I tried, I kept getting an error.
    Code:
    import xml.etree.ElementTree as ET
    
    tree = ET.parse('customer.xml')
    root = tree.getroot()
    
    for elem in root.findall('T'):
        for i in elem.getchildren():
            test = i.text
            print test
    This is the first 3 customer info (total 1500) from the xml file:
    Code:
    <table ID="customer">
      <T>
        <C_CUSTKEY>1</C_CUSTKEY>
        <C_NAME>Customer#000000001</C_NAME>
        <C_ADDRESS>IVhzIApeRb ot,c,E</C_ADDRESS>
        <C_NATIONKEY>15</C_NATIONKEY>
        <C_PHONE>25-989-741-2988</C_PHONE>
        <C_ACCTBAL>711.56</C_ACCTBAL>
        <C_MKTSEGMENT>BUILDING</C_MKTSEGMENT>
        <C_COMMENT>regular, regular platelets are fluffily according to the even attainments. blithely iron</C_COMMENT>
      </T>
      <T>
        <C_CUSTKEY>2</C_CUSTKEY>
        <C_NAME>Customer#000000002</C_NAME>
        <C_ADDRESS>XSTf4,NCwDVaWNe6tEgvwfmRchLXak</C_ADDRESS>
        <C_NATIONKEY>13</C_NATIONKEY>
        <C_PHONE>23-768-687-3665</C_PHONE>
        <C_ACCTBAL>121.65</C_ACCTBAL>
        <C_MKTSEGMENT>AUTOMOBILE</C_MKTSEGMENT>
        <C_COMMENT>furiously special deposits solve slyly. furiously even foxes wake alongside of the furiously ironic ideas. pending</C_COMMENT>
      </T>
      <T>
        <C_CUSTKEY>3</C_CUSTKEY>
        <C_NAME>Customer#000000003</C_NAME>
        <C_ADDRESS>MG9kdTD2WBHm</C_ADDRESS>
        <C_NATIONKEY>1</C_NATIONKEY>
        <C_PHONE>11-719-748-3364</C_PHONE>
        <C_ACCTBAL>7498.12</C_ACCTBAL>
        <C_MKTSEGMENT>AUTOMOBILE</C_MKTSEGMENT>
        <C_COMMENT>special packages wake. slyly reg</C_COMMENT>
      </T>
  2. #2
  3. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,699
    Rep Power
    480
    Who keeps giving you all these silly xml projects?

    Why do your programming attempts all look so similarly lame?
    [code]Code tags[/code] are essential for python code and Makefiles!
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Posts
    27
    Rep Power
    0
    Well sorry if you think these are silly xml projects and that my programming attempts are all similarly lame.
    I already know I'm slower than the average person, so thanks for pointing it out.
  6. #4
  7. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,699
    Rep Power
    480
    Well, at first I thought you were employed by a record store. I'm glad to help a little bit. In that case, you're employed and I'm not employed, so I'm not happy to help a lot.

    Next I thought you are a student because I'm sure I saw
    #fill in the rest of the function
    in a few of the routines. Doing homework for other people is an ethics violation. Although, again, I err on the side of answering homework questions because examples of working code seem better than examples of bad algorithms. And since I write doctests into many of my posts, and try them before I post, for the most part I'm posting somewhat functional algorithms. Sometimes I understand the questions and answer accordingly.

    Your xml isn't always about audio recordings. Maybe you're not a music store employee. The xml questions have been long running and have not changed much. It seems like a class would advance faster. So perhaps you're not a student, either. I'm baffled. But I also don't feel like answering more xml questions. I've now explored the xml module and have learned enough about it that I've lost personal motivation to answer more of these questions.
    Last edited by b49P23TIvg; February 13th, 2013 at 12:06 PM.
    [code]Code tags[/code] are essential for python code and Makefiles!
  8. #5
  9. Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Feb 2005
    Posts
    588
    Rep Power
    64
    So you want a list of tuples ...
    Code:
    ''' xml_test107.py
    
    '''
    
    import xml.etree.ElementTree as ET
    
    tree = ET.parse('customer.xml')
    root = tree.getroot()
    
    mylist = []
    for elem in root.findall('T'):
        sublist = []
        for ix, line in enumerate(elem.getchildren()):
            test = line.text
            sublist.append(test)
            if ix % 8 == 7:
                 mylist.append(tuple(sublist))
                 sublist = []
    
    
    import pprint
    pprint.pprint(mylist)
    
    ''' result ...
    
    [('1',
      'Customer#000000001',
      'IVhzIApeRb ot,c,E',
      '15',
      '25-989-741-2988',
      '711.56',
      'BUILDING',
      'regular, regular platelets are fluffily according to the even attainments. blithely iron'),
     ('2',
      'Customer#000000002',
      'XSTf4,NCwDVaWNe6tEgvwfmRchLXak',
      '13',
      '23-768-687-3665',
      '121.65',
      'AUTOMOBILE',
      'furiously special deposits solve slyly. furiously even foxes wake alongside of the furiously ironic ideas. pending'),
     ('3',
      'Customer#000000003',
      'MG9kdTD2WBHm',
      '1',
      '11-719-748-3364',
      '7498.12',
      'AUTOMOBILE',
      'special packages wake. slyly reg')]
    
    '''
    Real Programmers always confuse Christmas and Halloween because Oct31 == Dec25

IMN logo majestic logo threadwatch logo seochat tools logo