Skip to content
Advertisement

Web Scraping ESPN Data With Selenium

I’m trying to scrape some data off ESPN and run some calculations off the scraped data. Ideally, I will like to iterate through a dataframe, grab the players name with Selenium, send the player’s name into the search box and tell Selenium to click the player’s name. I was able to do this successfully with one player. I’m not quite sure how to iterate through all the players in my data frame.

The second part of the code is where I’m struggling. For some reason I am not able to get the data. Selenium isn’t able to find any of the elements. I don’t think I’m doing it properly. If I am able to scrape the required data, I will like to plug them into a calculation and append the calculated projected points into my dataframe, dfNBA.

Can someone please help me with my code? and point me in the right direction. I’m trying to be more efficient writing python codes but right now I’m stuck

Thanks

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
import pandas as pd
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

#sample data
pp = {'Player Name':['Donovan Mitchell', 'Kawhi Leonard', 'Rudy Gobert', 'Paul George','Reggie Jackson', 'Jordan Clarkson'],
      'Fantasy Score': [46.0, 50.0, 40.0, 44.0, 25.0, 26.5]}

#Creating a dataframe from dictionary
dfNBA = pd.DataFrame(pp)

#Scraping ESPN
PATH = "C:Program Files (x86)chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get("https://www.espn.com/")

#Clicking the search button
driver.find_element_by_xpath("//a[@id='global-search-trigger']").click() 

#sending data to the search button
driver.find_element_by_xpath("//input[@placeholder='Search Sports, Teams or Players...']").send_keys(dfNBA.iloc[0,:].values[0])
WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".search_results__details")))
playerPage = driver.find_element_by_css_selector(".search_results__details").click()

#Scraping data from last 10 games
points = driver.find_element_by_xpath(".//div[@class='Table__TD']")[13]
#rebs = driver.find_element_by_xpath("//*[@id='fittPageContainer'']/div[2]/div[5]/div/div[1]/div[1]/section/div/div[3]/div/div/div[2]/table/tbody/tr[1]/td[7]")                                    
#asts = driver.find_element_by_xpath("//*[@id='fittPageContainer']/div[2]/div[5]/div/div[1]/div[1]/section/div/div[3]/div/div/div[2]/table/tbody/tr[1]/td[8]")
#blks = driver.find_element_by_xpath("//*[@id='fittPageContainer']/div[2]/div[5]/div/div[1]/div[1]/section/div/div[3]/div/div/div[2]/table/tbody/tr[1]/td[9]")
#stls = driver.find_element_by_xpath("//*[@id='fittPageContainer']/div[2]/div[5]/div/div[1]/div[1]/section/div/div[3]/div/div/div[2]/table/tbody/tr[1]/td[10]")
#tnvrs = driver.find_element_by_xpath("//*[@id='fittPageContainer']/div[2]/div[5]/div/div[1]/div[1]/section/div/div[3]/div/div/div[2]/table/tbody/tr[1]/td[12]")

#projectedPoints = points+(rebs*1.2)+(asts*1.5)+(blks*3)+(stls*3)-(tnvrs*1)
print(points)


Advertisement

Answer

I think Selenium is a bit overkill when there’s a viable api option.

Give this a try. Note, that in the overview, the L10 games refers to last 10 regular season games. My code here does the last 10 games which include playoffs. If you only want regular season, let me know, and I can adjust it. I also added a variable here so if you wanted for example, just last 5 games, or last 15 games, etc. you could do that too.

import requests
import pandas as pd

previous_games = 10

pp = {'Player Name':['Donovan Mitchell', 'Kawhi Leonard', 'Rudy Gobert', 'Paul George','Reggie Jackson', 'Jordan Clarkson'],
      'Fantasy Score': [46.0, 50.0, 40.0, 44.0, 25.0, 26.5]}


#Creating a dataframe from dictionary
dfNBA = pd.DataFrame(pp)

search_api = 'https://site.api.espn.com/apis/search/v2'
for idx, row in dfNBA.iterrows():
    playerName = row['Player Name']
    payload = {'query': '%s' %playerName}

    results = requests.get(search_api, params=payload).json()['results']
    for each in results:
        if each['type'] == 'player':
            playerID = each['contents'][0]['uid'].split('a:')[-1]
            break
        
    player_api = 'https://site.web.api.espn.com/apis/common/v3/sports/basketball/nba/athletes/%s/gamelog' %playerID
    playload = {'season':'2021' }
    jsonData_player = requests.get(player_api, params=payload).json()
    
    #Scraping data from last x games
    last_x_gameIDs = list(jsonData_player['events'].keys())
    last_x_gameIDs.sort()
    last_x_gameIDs = last_x_gameIDs[-1*previous_games:]
    
    gamelog_dict = {}
    seasonTypes = jsonData_player['seasonTypes']
    for gameID in last_x_gameIDs:
        for each in seasonTypes:
            categories = each['categories']
            for category in categories:
                if category['type'] == 'total':
                    continue
                events = category['events']
                for event in events:
                    if gameID == event['eventId']:
                        gamelog_dict[gameID] = event['stats']
                    

    labels = jsonData_player['labels']
    
    # Aggrigate totals
    for k, v in gamelog_dict.items():
        v = dict(zip(labels, v))
        gamelog_dict[k] = v
        
    stats = pd.DataFrame(gamelog_dict.values())
    
    points = stats['PTS'].astype(float).sum() / previous_games
    rebs = stats['REB'].astype(float).sum() / previous_games
    asts = stats['AST'].astype(float).sum() / previous_games
    blks = stats['BLK'].astype(float).sum() / previous_games
    stls = stats['STL'].astype(float).sum() / previous_games
    tnvrs = stats['TO'].astype(float).sum() /previous_games

    projectedPoints = float(points)+(float(rebs)*1.2)+(float(asts)*1.5)+(float(blks)*3)+(float(stls)*3)-(float(tnvrs)*1)
    print('%s: %.02f' %(playerName,projectedPoints))

Output:

Donovan Mitchell: 42.72
Kawhi Leonard: 52.25
Rudy Gobert: 38.47
Paul George: 44.18
Reggie Jackson: 24.21
Jordan Clarkson: 25.88
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement