Skip to content
Advertisement

Beautifulsoup: Replace all with aria-level attributes with tags of the same level

I have a HTML source where <div> elements serve as headings. Using Beautifulsoup and the attribute aria-level I would like to replace all <div> elements with <h> tags of the same level. My code kind of works for my purpose but it seems inelegant and ideally, the attributes of the former <div> elements would be removed.

import bs4

html = '''<div id="container">
    <div role="heading" aria-level="1">The main page heading</div>
    <p>This article is about showing a page structure.</p>
    <div role="heading" aria-level="2">Introduction</div>
    <p>An introductory text.</p>
    <div role="heading" aria-level="2">Chapter 1</div>
    <p>Text</p>
    <div role="heading" aria-level="3">Chapter 1.1</div>
    <p>More text in a sub section.</p>
    </div>'''

soup = bs4.BeautifulSoup(html, "html.parser")

for divheader in soup.find_all("div", {"aria-level": "1"}):
    divheader.name = "h1"
for divheader in soup.find_all("div", {"aria-level": "2"}):
    divheader.name = "h2"
for divheader in soup.find_all("div", {"aria-level": "3"}):
    divheader.name = "h3"

print(soup)

Output:

<div id="container">
<h1 aria-level="1" role="heading">The main page heading</h1>
<p>This article is about showing a page structure.</p>
<h2 aria-level="2" role="heading">Introduction</h2>
<p>An introductory text.</p>
<h2 aria-level="2" role="heading">Chapter 1</h2>
<p>Text</p>
<h3 aria-level="3" role="heading">Chapter 1.1</h3>
<p>More text in a sub section.</p>
</div>

What it should look like:

<div id="container">
<h1>The main page heading</h1>
<p>This article is about showing a page structure.</p>
<h2>Introduction</h2>
<p>An introductory text.</p>
<h2>Chapter 1</h2>
<p>Text</p>
<h3>Chapter 1.1</h3>
<p>More text in a sub section.</p>
</div>

Advertisement

Answer

You can use del.attrs to delete all attributes from tag:

for div in soup.select("div[aria-level]"):
    div.name = f'h{div["aria-level"]}'
    del div.attrs

print(soup)

Prints:

<div id="container">
<h1>The main page heading</h1>
<p>This article is about showing a page structure.</p>
<h2>Introduction</h2>
<p>An introductory text.</p>
<h2>Chapter 1</h2>
<p>Text</p>
<h3>Chapter 1.1</h3>
<p>More text in a sub section.</p>
</div>
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement