Appending elements of a list into a multi-dimensional list

Question

Hi I&#8217;m doing some web scraping with NBA Data in python on this page. Some elements of basketball-reference are easy to scrape, but this one is giving me some trouble with my lack of python knowledge. I&#8217;m able to grab the data and column headers I want, but I end up with 2 lists of data that I need…

Accepted Answer

Let pandas do the parse of the table for you.import pandas as pdurl = "https://www.basketball-reference.com/friv/injuries.fcgi"injury_data = pd.read_html(url)[0]Output:print(injury_data) Player ... Description0 Onyeka Okongwu ... Out (Shoulder) - The Hawks announced that Okon...1 Jaylen Brown ... Out (Wrist) - The Celtics announced that Brown...2 Coby White ... Out (Shoulder) - The Bulls announced that Whit...3 Taurean Prince ... Out (Ankle) - The Cavaliers announced F Taurea...4 Jamal Murray ... Out (Knee) - Murray is recovering from a torn ...5 Klay Thompson ... Out (Right Achilles) - Thompson is on track to...6 James Wiseman ... Out (Knee) - Wiseman is on track to be ready b...7 T.J. Warren ... Out (Foot) - Warren underwent foot surgery and...8 Serge Ibaka ... Out (Back) - The Clippers announced Serge Ibak...9 Kawhi Leonard ... Out (Knee) - The Clippers announced Kawhi Leon...10 Victor Oladipo ... Out (Knee) - Oladipo could be cleared for full...11 Donte DiVincenzo ... Out (Foot) - DiVincenzo suffered a tendon inju...12 Jarrett Culver ... Out (Ankle) - The Timberwolves announced Culve...13 Markelle Fultz ... Out (Knee) - Fultz will miss the rest of the s...14 Jonathan Isaac ... Out (Knee) - Isaac is making progress with his...15 Dario Šarić ... Out (Knee) - The Suns announced that Sario has...16 Zach Collins ... Out (Ankle) - The Blazers announced that Colli...17 Pascal Siakam ... Out (Shoulder) - The Raptors announced Pascal ...18 Deni Avdija ... Out (Leg) - The Wizards announced that Avdija ...19 Thomas Bryant ... Out (Left knee) - The Wizards announced that B...[20 rows x 4 columns]But if you were to iterate it yourself, I’d simply get at the rows ( tags), then get the player name in the tag, and combine it with that row’s tags. Then create your dataframe from the list of those:from urllib.request import urlopenfrom bs4 import BeautifulSoupimport pandas as pdfrom datetime import datetime, timezone, timedeltaurl = "https://www.basketball-reference.com/friv/injuries.fcgi"html = urlopen(url)soup = BeautifulSoup(html)headers = [th.getText() for th in soup.findAll('tr', limit=2)[0].findAll('th')]trs = soup.findAll('tr')[1:]rows = []for tr in trs: player_name = tr.find('a').text data = [player_name] + [x.text for x in tr.find_all('td')] rows.append(data)injury_data = pd.DataFrame(rows, columns = headers)

Advertisement

Answer