I am using the bleach package to strip away invalid html. I am puzzled why the dir
attribute is being stripped from my string. Is dir
not an attribute, or could it just be that the package does not support dir
?
I have included the entire script, so you can run it for your convenience.
import bleach string = """<p dir="rtl">asdasdasd <span>asdasdasd</span> asdsadasdsad .<br data-mce-bogus="1"></p>""" def strip_invalid_html(html): """ strips invalid tags/attributes """ allowed_tags = [ 'p', 'a', 'blockquote', 'h1', 'h2', 'h3', 'h4', 'h5', 'strong', 'em', 'br', 'span', ] allowed_attributes = { 'a': ['href', 'title'], 'dir': ['rtl', 'ltr'] } cleaned_html = bleach.clean( html, attributes=allowed_attributes, strip=True, tags=allowed_tags ) print(cleaned_html) strip_invalid_html(string)
Advertisement
Answer
If you pass a dict for attributes
, the dict should map tag names to allowed attribute names, not map attribute names to allowed attribute values.
If you want 'dir'
to be an allowed attribute for p
tags, you need a 'p': ['dir']
entry, not a 'dir': ['rtl', 'ltr']
entry:
allowed_attributes = { 'a': ['href', 'title'], 'p': ['dir'], }