Why is my function stripping the dir attribute despite it being in my list of allowed attributes?

I am using the bleach package to strip away invalid html. I am puzzled why the dir attribute is being stripped from my string. Is dir not an attribute, or could it just be that the package does not support dir?

I have included the entire script, so you can run it for your convenience.

import bleach

string = """<p dir="rtl">asdasdasd <span>asdasdasd</span> asdsadasdsad .<br data-mce-bogus="1"></p>"""


def strip_invalid_html(html):
    """ strips invalid tags/attributes """

    allowed_tags = [
        'p', 'a', 'blockquote',
        'h1', 'h2', 'h3', 'h4', 'h5',
        'strong', 'em',
        'br',
        'span',
    ]

    allowed_attributes = {
        'a': ['href', 'title'],
        'dir': ['rtl', 'ltr']
    }

    cleaned_html = bleach.clean(
        html,
        attributes=allowed_attributes,
        strip=True,
        tags=allowed_tags
    )

    print(cleaned_html)

strip_invalid_html(string)

JavaScript
​x
 
import bleach
​
string = """<p dir="rtl">asdasdasd <span>asdasdasd</span> asdsadasdsad .<br data-mce-bogus="1"></p>"""
​
​
def strip_invalid_html(html):
    """ strips invalid tags/attributes """
​
    allowed_tags = [
        'p', 'a', 'blockquote',
        'h1', 'h2', 'h3', 'h4', 'h5',
        'strong', 'em',
        'br',
        'span',
    ]
​
    allowed_attributes = {
        'a': ['href', 'title'],
        'dir': ['rtl', 'ltr']
    }
​
    cleaned_html = bleach.clean(
        html,
        attributes=allowed_attributes,
        strip=True,
        tags=allowed_tags
    )
​
    print(cleaned_html)
​
strip_invalid_html(string)
​

Answer

If you pass a dict for attributes, the dict should map tag names to allowed attribute names, not map attribute names to allowed attribute values.

If you want 'dir' to be an allowed attribute for p tags, you need a 'p': ['dir'] entry, not a 'dir': ['rtl', 'ltr'] entry:

allowed_attributes = {
    'a': ['href', 'title'],
    'p': ['dir'],
}

JavaScript
 
allowed_attributes = {
    'a': ['href', 'title'],
    'p': ['dir'],
}
​

Advertisement

Answer