#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2013
    Posts
    5
    Rep Power
    0

    XML as data in HTML form post : how to escape special characters(x-postedTo XMLForum)


    Hi. I have an HTML form where the data for one of the input fields is an xml string.

    <htmlCode>
    <form action="xxx" method="post">
    <input type='hidden' id='myName' name='myName' value='joe'/>
    <input type='hidden' id='myXML' name='myXML' value='<myData><userid value="123"/><remarks text="age is &lt; 65"/></myData>'/>
    </form>
    </htmlCode>


    where xxx is some url.

    My question is : When I post this form, "&lt;" is being URLencoded and changed to %3C. However, on the destination side, the data sent over is not valid, since it thinks that the opening "<" for the remarks tag is not closed.

    How can I send data containing characters such as "<" in an xml string that is a value of an input element in a form post correctly?

    Any help is appreciated.

    -Andrew

    P.S. I'm x-posting this to the XML Programming forum since part of it involves XML
  2. #2
  3. Did you steal it?
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    14,054
    Rep Power
    9398
    The &amp;lt; you have is only HTML encoding it. It'll be decoded by the browser automatically. In fact you should be doing that for the other <s and >s too.

    So actually you need to double-encode it.
    Code:
    <myData>
        <userid value="123" />
        <remarks text="age is &amp;lt; 65" />
    </myData>
    Code:
    <input ... value='&amp;lt;myData&amp;gt;&amp;lt;userid value="123"/&amp;gt;&amp;lt;remarks text="age is &amp;amp;lt; 65"/&amp;gt;&amp;lt;/myData&amp;gt;' />
    For extra credit, simply HTML-encode everything. That'll turn "s into &amp;quot;s too.
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2013
    Posts
    5
    Rep Power
    0
    Thanks for your help, requinix. I did a double-encoding as per your suggestion and it passes the data over to the destination ok. However, at the destination, when it displays the remarks field in a textbox, the html code for < shows up as the code (that is
    Code:
    &amp;lt;
    ) instead of as <.

    How can I make it to show < instead?

    -Andrew

    Originally Posted by requinix
    The &amp;lt; you have is only HTML encoding it. It'll be decoded by the browser automatically. In fact you should be doing that for the other <s and >s too.

    So actually you need to double-encode it.
    Code:
    <myData>
        <userid value="123" />
        <remarks text="age is &amp;lt; 65" />
    </myData>
    Code:
    <input ... value='&amp;lt;myData&amp;gt;&amp;lt;userid value="123"/&amp;gt;&amp;lt;remarks text="age is &amp;amp;lt; 65"/&amp;gt;&amp;lt;/myData&amp;gt;' />
    For extra credit, simply HTML-encode everything. That'll turn "s into &amp;quot;s too.
    Last edited by ndrw_cheung; May 1st, 2013 at 01:55 PM. Reason: &lt; shows up as <
  6. #4
  7. Did you steal it?
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    14,054
    Rep Power
    9398
    This forum software is awkward with HTML entities. It shows up as &amp+lt?
    To confirm: there's one HTML-entitizing for the entire XML string, and a separate additional one for the <remarks>'s text?

    Using PHP as an example,
    PHP Code:
    $text = "age is < 65";
    // encode once to keep the text as literal text
    $xml = "<myData>...<remarks text=\"" . htmlspecialchars($text) . "\" />...</myData>";
    ?>

    <!-- encode twice to keep the value as a literal value -->
    <input ... value="<?=htmlspecialchars($xml)?>" />
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2013
    Posts
    5
    Rep Power
    0
    Yes. I did exactly what your PHP code does (only I did it in Java), but in the textbox at the destination page, it shows the < as the HTML-entitized value.

    Should I do anything on my side so that it shows up correctly as < or would it be on the destination that they have to do something about it?

    -Andrew


    Originally Posted by requinix
    This forum software is awkward with HTML entities. It shows up as &amp+lt?
    To confirm: there's one HTML-entitizing for the entire XML string, and a separate additional one for the <remarks>'s text?

    Using PHP as an example,
    PHP Code:
    $text = "age is < 65";
    // encode once to keep the text as literal text
    $xml = "<myData>...<remarks text=\"" . htmlspecialchars($text) . "\" />...</myData>";
    ?>

    <!-- encode twice to keep the value as a literal value -->
    <input ... value="<?=htmlspecialchars($xml)?>" />
  10. #6
  11. Did you steal it?
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    14,054
    Rep Power
    9398
    That lone < needs to be entitied. You should literally see
    Code:
    <myData><userid value="123" /><remarks text="age is &amp;lt; 65" /></myData>
    in the textbox.
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2013
    Posts
    5
    Rep Power
    0
    The problem is that the destination is a third-party website, and I don't know how they process the data once it gets there before displaying it in a textbox.

    When you say that the lone < needs to be entitied, do you mean it should be html-DECODED at the destination side, since it shows up as an html-encoded version (ampersand-l-t-semicolon)?

    -Andrew

    Originally Posted by requinix
    That lone < needs to be entitied. You should literally see
    Code:
    <myData><userid value="123" /><remarks text="age is &amp;lt; 65" /></myData>
    in the textbox.

IMN logo majestic logo threadwatch logo seochat tools logo