January 28th, 2005, 03:15 AM
decomposing html entities?
Does anyone know of some way of decomposing html entities (eg > to text (utf8/latin-1 whatever)?
I can't see anything in the manuals I have and a google didn't turn up anything useful.
Or will I have to write a string replace system?
January 28th, 2005, 12:44 PM
A while back when I first worked with HTML entities from Python (for my Net module – below) I wrote several functions for converting special characters to entities, however I've never needed a function to convert them back again .
For anyone who's interested: http://forums.devshed.com/t129666/s.html&highlight=net+module
But now I have the excuse, heres a simple function that should do what you want. It lacks any error checking but should work will all valid [numeric] entities.
Here's the same thing as a list comprehension in the Python shell, just to show the conversion in action:
As you can see it's surprisingly easy! And converting back again isn't much more difficult once you know how it all works .
>>> entities = ('>', '&', '<')
>>> [chr(int(entity[2:-1])) for entity in entities]
['>', '&', '<']
Hope this helps,
Last edited by netytan; January 28th, 2005 at 12:51 PM.
Reason: Added URL to my Net Module.
January 29th, 2005, 02:36 AM
Mental Note: search devshed before asking
Thanks Mark - this looks excellent.