Beautifulsoup sibling structure with br tags

Question

I’m trying to parse a HTML document using the BeautifulSoup Python library, but the structure is getting distorted by
tags. Let me just give you an example. Input HTML: HTML that BeautifulSoup interprets: In the source, the spans could be considered siblings. After parsing (using the default …

Accepted Answer

Your best bet is to extract() the line breaks. It’s easier than you think :).>>> from bs4 import BeautifulSoup as BS>>> html = """

... some text
... some more text
... and more text ...

""">>> soup = BS(html)>>> for linebreak in soup.find_all('br'):... linebreak.extract()...

>>> print soup.prettify()

some text some more text and more text

Advertisement

Answer