Module HtmlFilterHelper
In: html_filter_helper.rb

Implements a view helper method that lets you conditionally sanitize HTML as provided directly by a user, or via RedCloth or other markup/markdown creating libraries.

Methods

Public Instance methods

This helper is a flexible HTML sanitizer/whitelist that allows you to easily configure "profiles" for allowed tags. You can also specify what tag attributes (and values for those attributes) you consider "safe."

Whitelisting some HTML without inspecting attributes is pointless - were I a hacker, I could throw in cookie-stealing onmouseover/onclick events to gain admin control. I could throw in CSS to load porn images as backgrounds. You need to filter out attributes if you’re going to allow user-specified HTML at all.

The idea behind supporting multiple a "html profiles" is simple - sometimes you want to be able to use a wider range of HTML than others, yet you still want to maintain some control over your output. Admins get to put in a large subset of HTML tags and attributes, while anonymous comments can only use simple formatting tags.

PROFILES:

A "profile" is an optional hash that defines a filtering profile to use. The default profile allows the following tags with all attributes stripped: strong, b, ul, li, ol, i, u, code, pre, p, div, br, table, tr, td, th, tbody, thead, span, h1, h2, h3, h4, h5, h6, dl, and dt.

If a tag doesn’t exist as a key in the profile, it will be "deactivated" by having its opening bracket replaced with the HTML entity representing an open bracket, and the attributes will be untouched. Tag attributes that aren’t allowed on OK tags are stripped altogether.

A profile consists of a hash of hashes, where the first-level keys are HTML tags. The second level keys are attributes. There are two special attribute keywords - "none" and "any". These define whether or not we should allow no attributes or any attributes at all. Otherwise, an array of values defines what an attribute may be.

Umm. . . Yeah. Maybe that’s confusing. Here are some examples:

Example:

class User < ActiveRecord::Base

 USER_PROFILE={
                 'b'=>{'none'=>1},
                 'strong'=>{'class'=>['foo','bar','blee']},
                 'img'=>{'any'=>1}
                }

end

.… somewhere in a view .…

<%= filter_html(@html, User::USER_PROFILE) %>

will "deactivate" all HTML tags except <b> and <strong>, leaving the attributes on the deactivated tags untouched. It will strip all attributes from "b". It will allow the attribute "class" on <strong> when it contains the classes "foo", "bar", or "blee". It will allow any attributes on <img> tags, which is HIGHLY UNSAFE, but here for demonstration purposes.

More Examples:

filter_html(%Q|<p align="center" style="font-weight:bold;" class="body">Foo!</p>

                              <p align="justify">Am I justified?</p>
                              <ul type="fibble">
                                      <li foo="bar">Item 1</li>
                                      <li class="second">Item 2</li>
                        </ul>
                             |,{
                                     'p'=>{'align'=>['center','left','right'], 'class'=>['body']},
                                     'ul'=>{'none'=>1},
                                     'li'=>{'class'=>['first','second','third']}
                                     }
                             )

will give the following output: <p class=’body’ align=’center’>Foo!</p> <p>Am I justified?</p> <ul> <li>Item 1</li> <li class=’second’>Item 2</li> </ul>

The idea is that you’d define your profiles as CONSTANTS in your models, and then pass the appropriate profile into your filter_html() method in your view right before displaying the content.

The neat thing about using this method is that you leave the user’s original data untouched and you can relax/restrict the HTML tag profile more or less in the future without borking the original content. The bad thing is that you’re parsing HTML right before each display. Fragment caching can help with that, though.

SEE ALSO:

svn.techno-weenie.net/projects/plugins/white_list/

AUTHOR: Dan Collis-Puro - dan at endNOSPAMSUCKERpoint dot com

www.endpoint.com - work

www.kookdujour.com - blog

Based on the "Easy HTML Whitelists" recipe in "Rails Recipes".

The "profile" idea is loosely based on HTML::TagFilter on www.cpan.org

[Validate]