Can’t get tags when scraping data

Tags: , ,



I am trying to scrape all tr tags using BeautifulSoup, but it returns none. Code:

from urllib.request import urlopen
from bs4 import BeautifulSoup

url = 'https://www.pro-football-reference.com/years/2020/defense_advanced.htm'
html = urlopen(url)
stats_page = BeautifulSoup(html, "lxml")

column_headers = stats_page.findAll('tr')[0] #Line that returns none and throws IndexError
column_headers = [i.getText() for i in column_headers.findAll('th')]

Even though there are tr tags in this url, it returns none and throws an IndexError. Why is this happening?

Answer

In page source table is located inside comment. You need to extract comment content and then parse it as HTML:

from bs4 import BeautifulSoup
from bs4 import Comment

url = 'https://www.pro-football-reference.com/years/2020/defense_advanced.htm'
html = urlopen(url)
soup = BeautifulSoup(html, "lxml")
comment = soup.find(text=lambda text: isinstance(text, Comment) and 'class="table_outer_container"' in text)
stats_page = BeautifulSoup(comment, "lxml")
column_headers = stats_page.findAll('tr')[0]
column_headers = [i.getText() for i in column_headers.findAll('th')]


Source: stackoverflow