January 28th, 2005, 02:15 AM
-
decomposing html entities?
Evening all,
Does anyone know of some way of decomposing html entities (eg >
to text (utf8/latin-1 whatever)?
I can't see anything in the manuals I have and a google didn't turn up anything useful.
Or will I have to write a string replace system?
--Simon
January 28th, 2005, 11:44 AM
-
A while back when I first worked with HTML entities from Python (for my Net module – below) I wrote several functions for converting special characters to entities, however I've never needed a function to convert them back again
.
For anyone who's interested: http://forums.devshed.com/t129666/s.html&highlight=net+module
But now I have the excuse, heres a simple function that should do what you want. It lacks any error checking but should work will all valid [numeric] entities.
Code:
def convert(entity):
return chr(int(entity[2:-1]))
Here's the same thing as a list comprehension in the Python shell, just to show the conversion in action:
Code:
>>> entities = ('>', '&', '<')
>>> [chr(int(entity[2:-1])) for entity in entities]
['>', '&', '<']
>>>
As you can see it's surprisingly easy! And converting back again isn't much more difficult once you know how it all works
.
Hope this helps,
Mark.
Last edited by netytan; January 28th, 2005 at 11:51 AM.
Reason: Added URL to my Net Module.
January 29th, 2005, 01:36 AM
-
Mental Note: search devshed before asking 
Thanks Mark - this looks excellent.
-Simon