How to resolve error with None type of soup.find table?

Question

I try to get a table by using BeautifulSoap, and I faced error while using find method. I want to get headers of table from here: https://stooq.pl/t/?i=513&v=1&l=1 The id of a table i interested in is fth1, and HTML looks like that: My python script: I got the error: Traceback (most recent call last): File "/home/.../script.py", line 25, in for

Accepted Answer

You can still scrape the site, in order to do so you need to copy your cookies/headers from your browser and inject them into the request. If you go to your Network tab on the browser, find the HTML document and inspect it or right click and copy as curl, you can then convert that to python.Your request would then look something like this (but with your cookies):import requestsfrom bs4 import BeautifulSoupcookies = {    'cookie_uu': '',    'privacy': '',    'PHPSESSID': '',    'uid': '',    'cookie_user': '',    '_ga': '',    '_gid': '',    '__gads': '',    'FCCDCF': '',    'FCNEC': '',}headers = {    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Firefox/102.0',    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',    'Accept-Language': 'en-GB,en;q=0.5',    'Connection': 'keep-alive',    'Upgrade-Insecure-Requests': '1',    'Sec-Fetch-Dest': 'document',    'Sec-Fetch-Mode': 'navigate',    'Sec-Fetch-Site': 'none',    'Sec-Fetch-User': '?1',    'Pragma': 'no-cache',    'Cache-Control': 'no-cache',}params = {    'i': '513',    'v': '1',    'l': '1',}response = requests.get('https://stooq.pl/t/', params=params, cookies=cookies, headers=headers)soup = BeautifulSoup(response.text, 'lxml')table = soup.find('table', {'id': "fth1"})headers = [i.text for i in table.find_all('th')]print(headers)This returns:['Symbol', 'Nazwa', 'Otwarcie', 'Max', 'Min', 'Kurs', 'Zmiana', 'Wolumen', 'Obrót', 'Data', '']

Advertisement

Answer