I’m trying to scrape some data off ESPN and run some calculations off the scraped data. Ideally, I will like to iterate through a dataframe, grab the players name with Selenium, send the player’s name into the search box and tell Selenium to click the player’s name. I was able to do this successfully with one player. I’m not quite sure how to iterate through all the players in my data frame.
The second part of the code is where I’m struggling. For some reason I am not able to get the data. Selenium isn’t able to find any of the elements. I don’t think I’m doing it properly. If I am able to scrape the required data, I will like to plug them into a calculation and append the calculated projected points into my dataframe, dfNBA.
Can someone please help me with my code? and point me in the right direction. I’m trying to be more efficient writing python codes but right now I’m stuck
Thanks
from selenium import webdriver from selenium.webdriver.common.keys import Keys import time import pandas as pd from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC #sample data pp = {'Player Name':['Donovan Mitchell', 'Kawhi Leonard', 'Rudy Gobert', 'Paul George','Reggie Jackson', 'Jordan Clarkson'], 'Fantasy Score': [46.0, 50.0, 40.0, 44.0, 25.0, 26.5]} #Creating a dataframe from dictionary dfNBA = pd.DataFrame(pp) #Scraping ESPN PATH = "C:Program Files (x86)chromedriver.exe" driver = webdriver.Chrome(PATH) driver.get("https://www.espn.com/") #Clicking the search button driver.find_element_by_xpath("//a[@id='global-search-trigger']").click() #sending data to the search button driver.find_element_by_xpath("//input[@placeholder='Search Sports, Teams or Players...']").send_keys(dfNBA.iloc[0,:].values[0]) WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".search_results__details"))) playerPage = driver.find_element_by_css_selector(".search_results__details").click() #Scraping data from last 10 games points = driver.find_element_by_xpath(".//div[@class='Table__TD']")[13] #rebs = driver.find_element_by_xpath("//*[@id='fittPageContainer'']/div[2]/div[5]/div/div[1]/div[1]/section/div/div[3]/div/div/div[2]/table/tbody/tr[1]/td[7]") #asts = driver.find_element_by_xpath("//*[@id='fittPageContainer']/div[2]/div[5]/div/div[1]/div[1]/section/div/div[3]/div/div/div[2]/table/tbody/tr[1]/td[8]") #blks = driver.find_element_by_xpath("//*[@id='fittPageContainer']/div[2]/div[5]/div/div[1]/div[1]/section/div/div[3]/div/div/div[2]/table/tbody/tr[1]/td[9]") #stls = driver.find_element_by_xpath("//*[@id='fittPageContainer']/div[2]/div[5]/div/div[1]/div[1]/section/div/div[3]/div/div/div[2]/table/tbody/tr[1]/td[10]") #tnvrs = driver.find_element_by_xpath("//*[@id='fittPageContainer']/div[2]/div[5]/div/div[1]/div[1]/section/div/div[3]/div/div/div[2]/table/tbody/tr[1]/td[12]") #projectedPoints = points+(rebs*1.2)+(asts*1.5)+(blks*3)+(stls*3)-(tnvrs*1) print(points)
Advertisement
Answer
I think Selenium is a bit overkill when there’s a viable api option.
Give this a try. Note, that in the overview, the L10 games refers to last 10 regular season games. My code here does the last 10 games which include playoffs. If you only want regular season, let me know, and I can adjust it. I also added a variable here so if you wanted for example, just last 5 games, or last 15 games, etc. you could do that too.
import requests import pandas as pd previous_games = 10 pp = {'Player Name':['Donovan Mitchell', 'Kawhi Leonard', 'Rudy Gobert', 'Paul George','Reggie Jackson', 'Jordan Clarkson'], 'Fantasy Score': [46.0, 50.0, 40.0, 44.0, 25.0, 26.5]} #Creating a dataframe from dictionary dfNBA = pd.DataFrame(pp) search_api = 'https://site.api.espn.com/apis/search/v2' for idx, row in dfNBA.iterrows(): playerName = row['Player Name'] payload = {'query': '%s' %playerName} results = requests.get(search_api, params=payload).json()['results'] for each in results: if each['type'] == 'player': playerID = each['contents'][0]['uid'].split('a:')[-1] break player_api = 'https://site.web.api.espn.com/apis/common/v3/sports/basketball/nba/athletes/%s/gamelog' %playerID playload = {'season':'2021' } jsonData_player = requests.get(player_api, params=payload).json() #Scraping data from last x games last_x_gameIDs = list(jsonData_player['events'].keys()) last_x_gameIDs.sort() last_x_gameIDs = last_x_gameIDs[-1*previous_games:] gamelog_dict = {} seasonTypes = jsonData_player['seasonTypes'] for gameID in last_x_gameIDs: for each in seasonTypes: categories = each['categories'] for category in categories: if category['type'] == 'total': continue events = category['events'] for event in events: if gameID == event['eventId']: gamelog_dict[gameID] = event['stats'] labels = jsonData_player['labels'] # Aggrigate totals for k, v in gamelog_dict.items(): v = dict(zip(labels, v)) gamelog_dict[k] = v stats = pd.DataFrame(gamelog_dict.values()) points = stats['PTS'].astype(float).sum() / previous_games rebs = stats['REB'].astype(float).sum() / previous_games asts = stats['AST'].astype(float).sum() / previous_games blks = stats['BLK'].astype(float).sum() / previous_games stls = stats['STL'].astype(float).sum() / previous_games tnvrs = stats['TO'].astype(float).sum() /previous_games projectedPoints = float(points)+(float(rebs)*1.2)+(float(asts)*1.5)+(float(blks)*3)+(float(stls)*3)-(float(tnvrs)*1) print('%s: %.02f' %(playerName,projectedPoints))
Output:
Donovan Mitchell: 42.72 Kawhi Leonard: 52.25 Rudy Gobert: 38.47 Paul George: 44.18 Reggie Jackson: 24.21 Jordan Clarkson: 25.88