This is my first-time web scraping with beautiful soup and wanted to do a little project with hockey since I am a huge fan of the sport. I am a little stuck and wondering how to retrieve the header names of the stats for each player.
Here is my current code:
from bs4 import BeautifulSoup
import requests
import re
import pandas as pd
url = "http://www.espn.com/nhl/statistics/player/_/stat/points/year/2020/seasontype/2"
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
allStats = []
players = soup.find_all('tr', attrs={'class':re.compile('row player')})
for player in players:
stats = [stat.get_text() for stat in player.find_all('td')]
allStats += stats
body = soup.find_all('div', {"class":"wrapper"})
print(allStats)
allColumns = []
headers = soup.find_all('tr', attrs = {'class': 'colhead'})
for col in headers:
columns = [col.get_text() for col in headers.find_all('td')]
allColumns += columns
print(allColumns)
I am currently getting an error that says “ResultSet object has no attribute ‘%s’ for the line
headers = soup.find_all('tr', attrs = {'class': 'colhead'})
Eventually, I want to get a list of all of the Stat Names being tracked and use that as the columns in a pandas dataframe that lists each player and their corresponding stats.
What’s the best way to achieve this?
Thanks for your help!
Advertisement
Answer
There’s a typo in your headers iteration that’s why you’re getting the error,
AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?
I suppose the expected result is as follows.
allColumns = []
headers = soup.find_all('tr', attrs = {'class': 'colhead'})
for header in headers:
columns = [head.get_text() for head in header.find_all('td')]
allColumns += columns
>>>
['', 'PP', 'SH', 'RK', 'PLAYER', 'TEAM', 'GP', 'G', 'A', 'PTS', '+/-', 'PIM', 'PTS/G', 'SOG', 'PCT', 'GWG', 'G', 'A', 'G', 'A', '', 'PP', 'SH', 'RK', 'PLAYER', 'TEAM', 'GP', 'G', 'A', 'PTS', '+/-', 'PIM', 'PTS/G', 'SOG', 'PCT', 'GWG', 'G', 'A', 'G', 'A', '', 'PP', 'SH', 'RK', 'PLAYER', 'TEAM', 'GP', 'G', 'A', 'PTS', '+/-', 'PIM', 'PTS/G', 'SOG', 'PCT', 'GWG', 'G', 'A', 'G', 'A', '', 'PP', 'SH', 'RK', 'PLAYER', 'TEAM', 'GP', 'G', 'A', 'PTS', '+/-', 'PIM', 'PTS/G', 'SOG', 'PCT', 'GWG', 'G', 'A', 'G', 'A']