how to deal with pandas read_html gracefully when it fails to find a table?

Question

pandas read_html is a great and quick way of parsing tables; however, if it fails to find the table with the attributes specified, it will fail and cause the whole code to fail. I am trying to scrape thousands of web pages, and it is very annoying if it causes an error and termination of the whole program jus…

Accepted Answer

embed the pd.read_html in a try ... except ... exception handlerimport requestsimport pandas as pdlink = 'https://en.wikipedia.org/wiki/Barbados'req = requests.get(link)wiki_table = None try:    wiki_table = pd.read_html(req, attrs = {"class":"infobox vcard"})except TypeError as e: # to limit the catched exception to a minimal    print(str(e)) # optional but usefulif wiki_table:    df = wiki_table[0]        # rest of your code

Advertisement

Answer