I am using the bleach package to strip away invalid html. I am puzzled why the dir
attribute is being stripped from my string. Is dir
not an attribute, or could it just be that the package does not support dir
?
I have included the entire script, so you can run it for your convenience.
JavaScript
x
32
32
1
import bleach
2
3
string = """<p dir="rtl">asdasdasd <span>asdasdasd</span> asdsadasdsad .<br data-mce-bogus="1"></p>"""
4
5
6
def strip_invalid_html(html):
7
""" strips invalid tags/attributes """
8
9
allowed_tags = [
10
'p', 'a', 'blockquote',
11
'h1', 'h2', 'h3', 'h4', 'h5',
12
'strong', 'em',
13
'br',
14
'span',
15
]
16
17
allowed_attributes = {
18
'a': ['href', 'title'],
19
'dir': ['rtl', 'ltr']
20
}
21
22
cleaned_html = bleach.clean(
23
html,
24
attributes=allowed_attributes,
25
strip=True,
26
tags=allowed_tags
27
)
28
29
print(cleaned_html)
30
31
strip_invalid_html(string)
32
Advertisement
Answer
If you pass a dict for attributes
, the dict should map tag names to allowed attribute names, not map attribute names to allowed attribute values.
If you want 'dir'
to be an allowed attribute for p
tags, you need a 'p': ['dir']
entry, not a 'dir': ['rtl', 'ltr']
entry:
JavaScript
1
5
1
allowed_attributes = {
2
'a': ['href', 'title'],
3
'p': ['dir'],
4
}
5