totally novice in python, after many youtube videos and tutorial i’m trying to scrape basketball starting lineups from flashscore. Here’s an example of a link: https://www.flashscore.it/partita/6PN3pAhq/#informazioni-partita/formazioni
As you can see in the middle there’s a code (6PN3pAhq) that corresponds to a particular match: every match has a different one, i scraped all the results (144 matches at the moment) and stored it to an excel file…but now i’m searching for the best way to looping trough these differents Urls to scrape every match lineups (and appending to a unique dataframe)…
Here’s my code for the url above, any help is very appreciated!
from selenium import webdriver from bs4 import BeautifulSoup from time import sleep import pandas as pd URL = "https://www.flashscore.it/partita/6PN3pAhq/#informazioni-partita/formazioni" driver = webdriver.Chrome(r"C:chromedriver.exe") driver.get(URL) sleep(5) driver.find_element_by_id('onetrust-accept-btn-handler').click() soup = BeautifulSoup(driver.page_source, "html.parser") start = [] id = soup.find(class_="section") for id2 in id.find_all("a", {"class": "lf__participantName"}): start.append(id2.get('href')) df = pd.DataFrame(start) print (df)
Advertisement
Answer
If you need to store all the matches in an excel file somewhere you could use any number of open source tools to parse the excel file and extract the match numbers (see: http://www.python-excel.org/ for available options).
However, the simplest way, if possible, is to bypass excel entirely and store all of them in some text
file OR into your python program itself:
games = [ '6PN3pAhq' 'game2Code', 'game3Code' # and more... ]
And in your core code, use a string template:
# The url_template below contains a `{}` space where we can put any value when we format the string # see: https://docs.python.org/3/tutorial/inputoutput.html#the-string-format-method _url_template = "https://www.flashscore.it/partita/{}/#informazioni-partita/formazioni" for game_code in games: # extract getting game score logic somewhere else get_game_scores(game_code) def get_game_scores(game_code): formatted_url = _url_template.format(game_code) # do the stuff you did above
Lots of ways to go about this, but the core idea of this simple implementation is to separate the way you extract and parse game codes and how you get the game scores. The parser should store the game codes into some final collection you can just loop over and the logic to get game scores can focus on extract just a single game’s score.