Validate (X)HTML in Python

Question

What&#8217;s the best way to go about validating that a document follows some version of HTML (prefereably that I can specify)? I&#8217;d like to be able to know where the failures occur, as in a web-based validator, except in a native Python app. Answer XHTML is easy, use lxml. HTML is harder, since there&#8…

Accepted Answer

XHTML is easy, use lxml.from lxml import etreefrom StringIO import StringIOetree.parse(StringIO(html), etree.HTMLParser(recover=False))HTML is harder, since there&#8217;s traditionally not been as much interest in validation among the HTML crowd (run StackOverflow itself through a validator, yikes). The easiest solution would be to execute external applications such as nsgmls or OpenJade, and then parse their output.

Advertisement

Answer