September 30th, 2013, 06:43 AM
Has anyone used HTMLPurifier?
So I'm starting to build a CMS from scratch - something I haven't done in a while (having adopted MODx, but clients, hey?). Since last working on my own CMS I have learnt about character encoding, XSS, XSRF, SQL Injection, etc etc.
So I'm at the stage where I want to dump my user's html content (stored in MySQL and derrived from the output from a from a web based rich text editor) back into the editor for updating pages......my existing anti xss methods are over kill (they trash the actual html)...so I was having a mental block until I found HTMLPurifier
It seems comprehensive enough, but has any one had any experience with it? Good/Bad reviews?
September 30th, 2013, 08:55 AM
NO i had not used HTMLPurifier.tell me something about it???
Comments on this post
September 30th, 2013, 09:27 AM
why dont you (anupam) GOOGLE it!
September 30th, 2013, 01:17 PM
I wouldn't use HTML filters at all, because they're either damn complex or wrong (or both). This library seems to fall into the first category with around 350 classes.
The problem is that HTML itself is complex, especially if you do error correction. It's anything but easy to process.
To get an HTML document secure, you must first parse the whole thing, then remove the non-whitelisted elements and finally create a standards-compliant document from the remaining elements. HTML Purifier adds a lot of fancy features on top of that.
Why would you do this given that there are much simpler markup languages like BBCode? Parsing BBCode is trivial compared to parsing HTML. And it's more than enough if your users just wanna have some bold text and lists. If you're using a mainstream editor, it should already have a plugin for BBCode.
The only reason for using an HTML filter is when your users need the full power of HTML with all elements and attributes. You need that for writing complete HTML templates, but certainly not for a rich text editor.
October 1st, 2013, 05:02 AM
I'm not in control of this project and despite my objections the client and project manager (both non tech) want a tinymce like rich text editor and just 'don't get' the alternatives like BBCode and (my favourite) MarkDown.
So, I'm hacking it with DOMDocument - I can normalise the html, remove attributes (eg onclick) and tags (eg script, iframe) and save...thinking about an onchange/live edit feature so that the html behind the editor is always what I'll be saving....see how much time the project allows and the server load!
October 1st, 2013, 05:27 AM
You misunderstood me. TinyMCE is fine. In fact, I talked about it in the second last paragraph.
Originally Posted by Northie
What I'm saying is that your TinyMCE should output BBCode rather than HTML, because BBCode is much easier to parse and get secure than HTML.
The output generated by TinyMCE is just an intermediate format, anyway. Even if you use HTML, it has to be completely destroyed and rebuilt. So why not use a simple intermediate format in the first place?
Oh, I don't think that's a good idea. Screwing up the HTML filtering is very easy, getting it right is damn hard (just look at the HTMLPurifier source code).
Originally Posted by Northie
If you insist on HTML (for whatever reason), then use HTMLPurifier. It's better than any home-made filter.
October 1st, 2013, 06:23 AM
well HTML Purifier is a guidelines agreeable HTML channel library composed in PHP. HTML Purifier evacuates all malicious code with an altogether inspected and secure yet lenient whitelist, and guarantee gauges agreeability.
But i have not yet tried.
October 1st, 2013, 03:16 PM
I've used HTMLPurifier to great satisfaction in a WYSiWYG type of setup, it filters out all nasty stuff and the output is ready for pasting into your final HTML.
It may take some finessing to get the filter right but once it's setup it saves you a lot of work.
October 5th, 2013, 05:53 AM
Yeah i used it now an d i found it very useful tool for web development.