Skip to content
Advertisement

Scraping Sports Data With Beautifulsoup

This is my first-time web scraping with beautiful soup and wanted to do a little project with hockey since I am a huge fan of the sport. I am a little stuck and wondering how to retrieve the header names of the stats for each player.

Here is my current code:

from bs4 import BeautifulSoup
import requests
import re
import pandas as pd

url = "http://www.espn.com/nhl/statistics/player/_/stat/points/year/2020/seasontype/2"

page = requests.get(url)

soup = BeautifulSoup(page.text, 'html.parser')

allStats = []
players = soup.find_all('tr', attrs={'class':re.compile('row player')})
for player in players:
    stats = [stat.get_text() for stat in player.find_all('td')]
    allStats += stats
body = soup.find_all('div', {"class":"wrapper"})

print(allStats)

allColumns = []
headers = soup.find_all('tr', attrs = {'class': 'colhead'})
for col in headers:
    columns = [col.get_text() for col in headers.find_all('td')]
    allColumns += columns

print(allColumns)

I am currently getting an error that says “ResultSet object has no attribute ‘%s’ for the line

headers = soup.find_all('tr', attrs = {'class': 'colhead'})

Eventually, I want to get a list of all of the Stat Names being tracked and use that as the columns in a pandas dataframe that lists each player and their corresponding stats.

What’s the best way to achieve this?

Thanks for your help!

Advertisement

Answer

There’s a typo in your headers iteration that’s why you’re getting the error,

AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?

I suppose the expected result is as follows.

allColumns = []
headers = soup.find_all('tr', attrs = {'class': 'colhead'})
for header in headers:
    columns = [head.get_text() for head in header.find_all('td')]
    allColumns += columns
>>>
['', 'PP', 'SH', 'RK', 'PLAYER', 'TEAM', 'GP', 'G', 'A', 'PTS', '+/-', 'PIM', 'PTS/G', 'SOG', 'PCT', 'GWG', 'G', 'A', 'G', 'A', '', 'PP', 'SH', 'RK', 'PLAYER', 'TEAM', 'GP', 'G', 'A', 'PTS', '+/-', 'PIM', 'PTS/G', 'SOG', 'PCT', 'GWG', 'G', 'A', 'G', 'A', '', 'PP', 'SH', 'RK', 'PLAYER', 'TEAM', 'GP', 'G', 'A', 'PTS', '+/-', 'PIM', 'PTS/G', 'SOG', 'PCT', 'GWG', 'G', 'A', 'G', 'A', '', 'PP', 'SH', 'RK', 'PLAYER', 'TEAM', 'GP', 'G', 'A', 'PTS', '+/-', 'PIM', 'PTS/G', 'SOG', 'PCT', 'GWG', 'G', 'A', 'G', 'A']
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement