I am learning web-scraping.
I succeeded scraping top youtubers ranking with this as reference.
I am using the same logic to scrape the PL ranking, but having two issues:
- it is only collecting up to 5th place.
- it is getting only the first place for the result
- and then, getting attribute error:
from bs4 import BeautifulSoup import requests import csv url = 'https://www.premierleague.com/tables' page = requests.get(url) soup = BeautifulSoup(page.content, 'html.parser') standings = soup.find('div', attrs={'data-ui-tab': 'First Team'}).find_all('tr')[1:] print(standings) file = open("pl_standings.csv", 'w') writer = csv.writer(file) writer.writerow(['position', 'club_name', 'points']) for standing in standings: position = standing.find('span', attrs={'class': 'value'}).text.strip() club_name = standing.find('span', {'class': 'long'}).text points = standing.find('td', {'class': 'points'}).text print(position, club_name, points) writer.writerow([position, club_name, points]) file.close()
Advertisement
Answer
The issue is that html.parser
doesn’t parse the page correctly (try using lxml
parser). Also, there get every second <tr>
to get correct results:
import requests from bs4 import BeautifulSoup url = "https://www.premierleague.com/tables" page = requests.get(url) soup = BeautifulSoup(page.content, "lxml") # <-- use lxml standings = soup.find("div", attrs={"data-ui-tab": "First Team"}).find_all( "tr" )[1::2] # <-- get every second <tr> for standing in standings: position = standing.find("span", attrs={"class": "value"}).text.strip() club_name = standing.find("span", {"class": "long"}).text points = standing.find("td", {"class": "points"}).text print(position, club_name, points)
Prints:
1 Manchester City 77 2 Liverpool 76 3 Chelsea 62 4 Tottenham Hotspur 57 5 Arsenal 57 6 Manchester United 54 7 West Ham United 52 8 Wolverhampton Wanderers 49 9 Leicester City 41 10 Brighton and Hove Albion 40 11 Newcastle United 40 12 Brentford 39 13 Southampton 39 14 Crystal Palace 37 15 Aston Villa 36 16 Leeds United 33 17 Everton 29 18 Burnley 28 19 Watford 22 20 Norwich City 21