How do I pass a user-agent to panda’s pd.read_html()?

Question

some websites automatically decline requests due to lack of user-agent, and it&#8217;s a hassle using bs4 to scrape many different types of tables. This issue was resolved before through this code: However urllib2 has been depreciated and urllib3 doesn&#8217;t have a build_opener() attribute, and I could not …

Accepted Answer

read_html() accepts a URL and string, so u can set headers on request, and pandas ll read this resoponse like a text:import pandas as pdimport requestsurl = 'http://finance.yahoo.com/quote/A/key-statistics?p=A'response = requests.get(url, headers={'User-agent': 'Mozilla/5.0'})tables = pd.read_html(response.text)print(tables)If u open read_html() none of the options accept headers as an argument, so just set headers in request

Advertisement

Answer