Skip to content
Advertisement

How to resolve error with None type of soup.find table?

I try to get a table by using BeautifulSoap, and I faced error while using find method.

I want to get headers of table from here: https://stooq.pl/t/?i=513&v=1&l=1

The id of a table i interested in is fth1, and HTML looks like that:

<table class="fth1" id="fth1" width="100%" cellspacing="0" cellpadding="3" border="0">
    <thead style="background-color:e9e9e9">
        <tr align="center">
            <th id="f13">
                <a href="t/?i=513&amp;v=1&amp;o=1">Symbol</a>
            </th>
            <th id="f13">
                <a href="t/?i=513&amp;v=1&amp;o=2">Nazwa</a>
            </th>
        ...

My python script:

from selenium import webdriver
import requests
from bs4 import BeautifulSoup

driver = webdriver.Chrome()
page = requests.get('https://stooq.pl/t/?i=513&v=1&l=1')
soup = BeautifulSoup(page.text, 'lxml')

table1 = soup.find('table', {'id': "fth1"})

headers = []
for i in table1.find_all('th'):
    title = i.text
    headers.append(title)

print(headers)

I got the error:

Traceback (most recent call last): File “/home/…/script.py”, line 25, in for i in table1.find_all(‘th’): AttributeError: ‘NoneType’ object has no attribute ‘find_all’

I found that the variable table1 has a type None. I’ve tried use html.parser and html5lib instead of lxml but with no success.

What is wrong that I got such error?

Advertisement

Answer

You can still scrape the site, in order to do so you need to copy your cookies/headers from your browser and inject them into the request. If you go to your Network tab on the browser, find the HTML document and inspect it or right click and copy as curl, you can then convert that to python.

Your request would then look something like this (but with your cookies):

import requests
from bs4 import BeautifulSoup

cookies = {
    'cookie_uu': '',
    'privacy': '',
    'PHPSESSID': '',
    'uid': '',
    'cookie_user': '',
    '_ga': '',
    '_gid': '',
    '__gads': '',
    'FCCDCF': '',
    'FCNEC': '',
}

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Firefox/102.0',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
    'Accept-Language': 'en-GB,en;q=0.5',
    'Connection': 'keep-alive',
    'Upgrade-Insecure-Requests': '1',
    'Sec-Fetch-Dest': 'document',
    'Sec-Fetch-Mode': 'navigate',
    'Sec-Fetch-Site': 'none',
    'Sec-Fetch-User': '?1',
    'Pragma': 'no-cache',
    'Cache-Control': 'no-cache',
}

params = {
    'i': '513',
    'v': '1',
    'l': '1',
}

response = requests.get('https://stooq.pl/t/', params=params, cookies=cookies, headers=headers)
soup = BeautifulSoup(response.text, 'lxml')

table = soup.find('table', {'id': "fth1"})

headers = [i.text for i in table.find_all('th')]
print(headers)

This returns:

['Symbol', 'Nazwa', 'Otwarcie', 'Max', 'Min', 'Kurs', 'Zmiana', 'Wolumen', 'ObrĂ³t', 'Data', '']
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement