#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2009
    Posts
    58
    Rep Power
    5

    Handling NON-ASCII char in XML


    Hello Everyone,
    I am new to XML just started yesterday.
    I have some problem with non-ASCII char in XML.
    when such char comes in text, XML error-out.
    I Googled for the error and found that "C data" can handle such thing.
    but "C data" is not helping any more.
    Here I am pasting the XML content

    Code:
    <?xml version="1.0"?>
    <add>
    <doc>
    <field name="primary_key">436_3_2</field>
    <field name="title"><![CDATA[RE: which is the best Touchscreen Phone right now? ]]></field>
    <field name="description"><![CDATA[^ 
    There's also a few new phones coming out in October that might be worth waiting for, the desire HD, g2, acer liquid metal and the Nokia n8 (worth a look if the price is right) 
    c:usersNeilAppDataRoamingLimeWirerowserxulrunnerchromecomm.manifest c:usersNeilAppDataRoamingLimeWirerowserxulrunnerchromeen-US.jar
    ]]></field>
    </doc>
    </add>
    Please look at this in free time and help the needy
  2. #2
  3. Did you steal it?
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    13,965
    Rep Power
    9397
    Set an encoding for the file and XML, then make sure the contents are in that encoding. A CDATA won't help.

    Alternatively, encode the characters with entities: U+0008 would be &amp;&amp;#35;8;.
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2009
    Posts
    58
    Rep Power
    5
    Thanks requinix for your time.
    But I did'nt got you.
    please modify the given XML ( in first Post) and please paste it in reply.


    Originally Posted by requinix
    Set an encoding for the file and XML, then make sure the contents are in that encoding. A CDATA won't help.

    Alternatively, encode the characters with entities: U+0008 would be &amp;&amp;#35;8;.
  6. #4
  7. Did you steal it?
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    13,965
    Rep Power
    9397
    No.

    The s in the code you posted are the characters I'm talking about. Here is an article on Wikipedia about numeric character entities.
    How you encode the characters programmatically depends on what language(s) you're using.
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2009
    Posts
    58
    Rep Power
    5
    Got the solution for Handling Ctrl char in XML

    python Code:
    import string
    deletion_char = ''.join([chr(i) for i in range(32)])
    char_table = ''.join([chr(i) for i in range(256)])
    new_xml_string = string.translate(xml_string , char_table, deletion_char)


    Its working fine now.
    Thanks All ..

IMN logo majestic logo threadwatch logo seochat tools logo