I have a HTML source where <div>
elements serve as headings. Using Beautifulsoup and the attribute aria-level
I would like to replace all <div>
elements with <h>
tags of the same level. My code kind of works for my purpose but it seems inelegant and ideally, the attributes of the former <div>
elements would be removed.
JavaScript
x
24
24
1
import bs4
2
3
html = '''<div id="container">
4
<div role="heading" aria-level="1">The main page heading</div>
5
<p>This article is about showing a page structure.</p>
6
<div role="heading" aria-level="2">Introduction</div>
7
<p>An introductory text.</p>
8
<div role="heading" aria-level="2">Chapter 1</div>
9
<p>Text</p>
10
<div role="heading" aria-level="3">Chapter 1.1</div>
11
<p>More text in a sub section.</p>
12
</div>'''
13
14
soup = bs4.BeautifulSoup(html, "html.parser")
15
16
for divheader in soup.find_all("div", {"aria-level": "1"}):
17
divheader.name = "h1"
18
for divheader in soup.find_all("div", {"aria-level": "2"}):
19
divheader.name = "h2"
20
for divheader in soup.find_all("div", {"aria-level": "3"}):
21
divheader.name = "h3"
22
23
print(soup)
24
Output:
JavaScript
1
11
11
1
<div id="container">
2
<h1 aria-level="1" role="heading">The main page heading</h1>
3
<p>This article is about showing a page structure.</p>
4
<h2 aria-level="2" role="heading">Introduction</h2>
5
<p>An introductory text.</p>
6
<h2 aria-level="2" role="heading">Chapter 1</h2>
7
<p>Text</p>
8
<h3 aria-level="3" role="heading">Chapter 1.1</h3>
9
<p>More text in a sub section.</p>
10
</div>
11
What it should look like:
JavaScript
1
11
11
1
<div id="container">
2
<h1>The main page heading</h1>
3
<p>This article is about showing a page structure.</p>
4
<h2>Introduction</h2>
5
<p>An introductory text.</p>
6
<h2>Chapter 1</h2>
7
<p>Text</p>
8
<h3>Chapter 1.1</h3>
9
<p>More text in a sub section.</p>
10
</div>
11
Advertisement
Answer
You can use del.attrs
to delete all attributes from tag:
JavaScript
1
6
1
for div in soup.select("div[aria-level]"):
2
div.name = f'h{div["aria-level"]}'
3
del div.attrs
4
5
print(soup)
6
Prints:
JavaScript
1
11
11
1
<div id="container">
2
<h1>The main page heading</h1>
3
<p>This article is about showing a page structure.</p>
4
<h2>Introduction</h2>
5
<p>An introductory text.</p>
6
<h2>Chapter 1</h2>
7
<p>Text</p>
8
<h3>Chapter 1.1</h3>
9
<p>More text in a sub section.</p>
10
</div>
11