#1
  1. Mad Scientist
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Oct 2007
    Location
    North Yorkshire, UK
    Posts
    3,660
    Rep Power
    4123

    Has anyone used HTMLPurifier?


    So I'm starting to build a CMS from scratch - something I haven't done in a while (having adopted MODx, but clients, hey?). Since last working on my own CMS I have learnt about character encoding, XSS, XSRF, SQL Injection, etc etc.

    So I'm at the stage where I want to dump my user's html content (stored in MySQL and derrived from the output from a from a web based rich text editor) back into the editor for updating pages......my existing anti xss methods are over kill (they trash the actual html)...so I was having a mental block until I found HTMLPurifier

    It seems comprehensive enough, but has any one had any experience with it? Good/Bad reviews?
    I said I didn't like ORM!!! <?php $this->model->update($this->request->resources[0])->set($this->request->getData())->getData('count'); ?>

    PDO vs mysql_* functions: Find a Migration Guide Here

    [ Xeneco - T'interweb Development ] - [ Are you a Help Vampire? ] - [ Read The manual! ] - [ W3 methods - GET, POST, etc ] - [ Web Design Hell ]
  2. #2
  3. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2013
    Posts
    12
    Rep Power
    0

    HTMLPurifier


    NO i had not used HTMLPurifier.tell me something about it???

    Comments on this post

    • Northie disagrees : Then why bother answering?
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Dec 2004
    Posts
    2,870
    Rep Power
    369
    why dont you (anupam) GOOGLE it!
  6. #4
  7. --
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Jul 2012
    Posts
    3,930
    Rep Power
    1045
    Hi,

    I wouldn't use HTML filters at all, because they're either damn complex or wrong (or both). This library seems to fall into the first category with around 350 classes.

    The problem is that HTML itself is complex, especially if you do error correction. It's anything but easy to process.

    To get an HTML document secure, you must first parse the whole thing, then remove the non-whitelisted elements and finally create a standards-compliant document from the remaining elements. HTML Purifier adds a lot of fancy features on top of that.

    Why would you do this given that there are much simpler markup languages like BBCode? Parsing BBCode is trivial compared to parsing HTML. And it's more than enough if your users just wanna have some bold text and lists. If you're using a mainstream editor, it should already have a plugin for BBCode.

    The only reason for using an HTML filter is when your users need the full power of HTML with all elements and attributes. You need that for writing complete HTML templates, but certainly not for a rich text editor.
    The 6 worst sins of security ē How to (properly) access a MySQL database with PHP

    Why canít I use certain words like "drop" as part of my Security Question answers?
    There are certain words used by hackers to try to gain access to systems and manipulate data; therefore, the following words are restricted: "select," "delete," "update," "insert," "drop" and "null".
  8. #5
  9. Mad Scientist
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Oct 2007
    Location
    North Yorkshire, UK
    Posts
    3,660
    Rep Power
    4123
    I'm not in control of this project and despite my objections the client and project manager (both non tech) want a tinymce like rich text editor and just 'don't get' the alternatives like BBCode and (my favourite) MarkDown.

    So, I'm hacking it with DOMDocument - I can normalise the html, remove attributes (eg onclick) and tags (eg script, iframe) and save...thinking about an onchange/live edit feature so that the html behind the editor is always what I'll be saving....see how much time the project allows and the server load!
    I said I didn't like ORM!!! <?php $this->model->update($this->request->resources[0])->set($this->request->getData())->getData('count'); ?>

    PDO vs mysql_* functions: Find a Migration Guide Here

    [ Xeneco - T'interweb Development ] - [ Are you a Help Vampire? ] - [ Read The manual! ] - [ W3 methods - GET, POST, etc ] - [ Web Design Hell ]
  10. #6
  11. --
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Jul 2012
    Posts
    3,930
    Rep Power
    1045
    Originally Posted by Northie
    I'm not in control of this project and despite my objections the client and project manager (both non tech) want a tinymce like rich text editor and just 'don't get' the alternatives like BBCode and (my favourite) MarkDown.
    You misunderstood me. TinyMCE is fine. In fact, I talked about it in the second last paragraph.

    What I'm saying is that your TinyMCE should output BBCode rather than HTML, because BBCode is much easier to parse and get secure than HTML.

    The output generated by TinyMCE is just an intermediate format, anyway. Even if you use HTML, it has to be completely destroyed and rebuilt. So why not use a simple intermediate format in the first place?



    Originally Posted by Northie
    So, I'm hacking it with DOMDocument - I can normalise the html, remove attributes (eg onclick) and tags (eg script, iframe) and save...thinking about an onchange/live edit feature so that the html behind the editor is always what I'll be saving....see how much time the project allows and the server load!
    Oh, I don't think that's a good idea. Screwing up the HTML filtering is very easy, getting it right is damn hard (just look at the HTMLPurifier source code).

    If you insist on HTML (for whatever reason), then use HTMLPurifier. It's better than any home-made filter.
    The 6 worst sins of security ē How to (properly) access a MySQL database with PHP

    Why canít I use certain words like "drop" as part of my Security Question answers?
    There are certain words used by hackers to try to gain access to systems and manipulate data; therefore, the following words are restricted: "select," "delete," "update," "insert," "drop" and "null".
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2013
    Posts
    27
    Rep Power
    0
    well HTML Purifier is a guidelines agreeable HTML channel library composed in PHP. HTML Purifier evacuates all malicious code with an altogether inspected and secure yet lenient whitelist, and guarantee gauges agreeability.
    But i have not yet tried.
  14. #8
  15. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2013
    Posts
    158
    Rep Power
    10
    I've used HTMLPurifier to great satisfaction in a WYSiWYG type of setup, it filters out all nasty stuff and the output is ready for pasting into your final HTML.

    It may take some finessing to get the filter right but once it's setup it saves you a lot of work.
  16. #9
  17. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2013
    Posts
    12
    Rep Power
    0

    html purifier


    Yeah i used it now an d i found it very useful tool for web development.

IMN logo majestic logo threadwatch logo seochat tools logo