HTML Purifier 1.3.1 review

Download
by rbytes.net on

HTML Purifier project is the premiere PHP solution for all your HTML filtering needs

License: LGPL (GNU Lesser General Public License)
File size: 226K
Developer: Edward Z. Yang
0 stars award from rbytes.net

HTML Purifier project is the premiere PHP solution for all your HTML filtering needs. Tired of forcing users to use BBCode or some other obscure custom markup language due to the current landscape of deficient or hole-ridden HTML filterers? Look no further: HTMLPurifier will not only remove all malicious code (the stuff of XSS), it will also make sure the HTML is standards compliant.

There are a number of ad hoc HTML filtering solutions out there on the web (some examples including PEAR's HTML_Safe, kses and SafeHtmlChecker.class.php) that claim to filter HTML properly, preventing malicious JavaScript and layout breaking HTML from getting through the parser. None of them, however, demonstrates a thorough knowledge of the DTD that defines HTML or the caveats of HTML that cannot be expressed by a DTD.

Configurable filters (such as kses or PHP's built-in striptags() function) have trouble validating the contents of attributes and can be subject to security attacks due to poor configuration. Other filters take the naive approach of blacklisting known threats and tags, failing to account for the introduction of new technologies, new tags, new attributes or quirky browser behavior.

However, HTML Purifier takes a different approach, one that doesn't use specification-ignorant regexes or narrow blacklists. HTML Purifier will decompose the whole document into tokens, and rigorously process the tokens by: removing non-whitelisted elements, transforming bad practice tags like font into span, properly checking the nesting of tags and their children and validating all attributes according to their RFCs.

To my knowledge, there is nothing like this on the web yet. Not even MediaWiki, which allows an amazingly diverse mix of HTML and wikitext in its documents, gets all the nesting quirks right. Existing solutions hope that no JavaScript will slip through, but either do not attempt to ensure that the resulting output is valid XHTML or send the HTML through a draconic XML parser (and yet still get the nesting wrong: SafeHtmlChecker.class.php does not prevent a tags from being nested within each other).

What's New in This Release:
This release fixes a bug that nuked all image tags when %Core.RemoveInvalidImg was set to true (which it is by default).
There is also a convenience file called HTMLPurifier.func.php, which gives you an easy to use HTMLPurifier() function to use for simple filtering needs.

HTML Purifier 1.3.1 search tags