Don’t put html, head and body tags automatically, beautifulsoup

Question

I'm using beautifulsoup with html5lib, it puts the html, head and body tags automatically: Is there any option that I can set, turn off this behavior ? Answer This parses the HTML with Python's builtin HTML parser. Quoting the docs: Unlike html5lib, this parser makes no attempt to create a well-formed HTML document by adding a <body> tag. Unlike lxml,

Accepted Answer

In [35]: import bs4 as bsIn [36]: bs.BeautifulSoup('

FOO

', "html.parser")Out[36]:

FOO

This parses the HTML with Python’s builtin HTML parser.Quoting the docs:Unlike html5lib, this parser makes no attempt to create a well-formedHTML document by adding a tag. Unlike lxml, it doesn’t evenbother to add an tag.Alternatively, you could use the html5lib parser and just select the element after :In [61]: soup = bs.BeautifulSoup('

FOO

', 'html5lib')In [62]: soup.body.nextOut[62]:

Advertisement

Answer