I have a HTML source where <div>
elements serve as headings. Using Beautifulsoup and the attribute aria-level
I would like to replace all <div>
elements with <h>
tags of the same level. My code kind of works for my purpose but it seems inelegant and ideally, the attributes of the former <div>
elements would be removed.
import bs4 html = '''<div id="container"> <div role="heading" aria-level="1">The main page heading</div> <p>This article is about showing a page structure.</p> <div role="heading" aria-level="2">Introduction</div> <p>An introductory text.</p> <div role="heading" aria-level="2">Chapter 1</div> <p>Text</p> <div role="heading" aria-level="3">Chapter 1.1</div> <p>More text in a sub section.</p> </div>''' soup = bs4.BeautifulSoup(html, "html.parser") for divheader in soup.find_all("div", {"aria-level": "1"}): divheader.name = "h1" for divheader in soup.find_all("div", {"aria-level": "2"}): divheader.name = "h2" for divheader in soup.find_all("div", {"aria-level": "3"}): divheader.name = "h3" print(soup)
Output:
<div id="container"> <h1 aria-level="1" role="heading">The main page heading</h1> <p>This article is about showing a page structure.</p> <h2 aria-level="2" role="heading">Introduction</h2> <p>An introductory text.</p> <h2 aria-level="2" role="heading">Chapter 1</h2> <p>Text</p> <h3 aria-level="3" role="heading">Chapter 1.1</h3> <p>More text in a sub section.</p> </div>
What it should look like:
<div id="container"> <h1>The main page heading</h1> <p>This article is about showing a page structure.</p> <h2>Introduction</h2> <p>An introductory text.</p> <h2>Chapter 1</h2> <p>Text</p> <h3>Chapter 1.1</h3> <p>More text in a sub section.</p> </div>
Advertisement
Answer
You can use del.attrs
to delete all attributes from tag:
for div in soup.select("div[aria-level]"): div.name = f'h{div["aria-level"]}' del div.attrs print(soup)
Prints:
<div id="container"> <h1>The main page heading</h1> <p>This article is about showing a page structure.</p> <h2>Introduction</h2> <p>An introductory text.</p> <h2>Chapter 1</h2> <p>Text</p> <h3>Chapter 1.1</h3> <p>More text in a sub section.</p> </div>