Skip to content
Advertisement

Don’t put html, head and body tags automatically, beautifulsoup

I’m using beautifulsoup with html5lib, it puts the html, head and body tags automatically:

JavaScript

Is there any option that I can set, turn off this behavior ?

Advertisement

Answer

JavaScript

This parses the HTML with Python’s builtin HTML parser. Quoting the docs:

Unlike html5lib, this parser makes no attempt to create a well-formed HTML document by adding a <body> tag. Unlike lxml, it doesn’t even bother to add an <html> tag.


Alternatively, you could use the html5lib parser and just select the element after <body>:

JavaScript
Advertisement