I am learning web-scraping.
I succeeded scraping top youtubers ranking with this as reference.
I am using the same logic to scrape the PL ranking, but having two issues:
- it is only collecting up to 5th place.
- it is getting only the first place for the result
- and then, getting attribute error:
JavaScript
x
28
28
1
from bs4 import BeautifulSoup
2
import requests
3
import csv
4
5
6
url = 'https://www.premierleague.com/tables'
7
page = requests.get(url)
8
soup = BeautifulSoup(page.content, 'html.parser')
9
10
standings = soup.find('div', attrs={'data-ui-tab': 'First Team'}).find_all('tr')[1:]
11
print(standings)
12
13
file = open("pl_standings.csv", 'w')
14
writer = csv.writer(file)
15
16
writer.writerow(['position', 'club_name', 'points'])
17
18
for standing in standings:
19
position = standing.find('span', attrs={'class': 'value'}).text.strip()
20
club_name = standing.find('span', {'class': 'long'}).text
21
points = standing.find('td', {'class': 'points'}).text
22
23
print(position, club_name, points)
24
25
writer.writerow([position, club_name, points])
26
27
file.close()
28
Advertisement
Answer
The issue is that html.parser
doesn’t parse the page correctly (try using lxml
parser). Also, there get every second <tr>
to get correct results:
JavaScript
1
18
18
1
import requests
2
from bs4 import BeautifulSoup
3
4
5
url = "https://www.premierleague.com/tables"
6
page = requests.get(url)
7
soup = BeautifulSoup(page.content, "lxml") # <-- use lxml
8
9
standings = soup.find("div", attrs={"data-ui-tab": "First Team"}).find_all(
10
"tr"
11
)[1::2] # <-- get every second <tr>
12
13
for standing in standings:
14
position = standing.find("span", attrs={"class": "value"}).text.strip()
15
club_name = standing.find("span", {"class": "long"}).text
16
points = standing.find("td", {"class": "points"}).text
17
print(position, club_name, points)
18
Prints:
JavaScript
1
21
21
1
1 Manchester City 77
2
2 Liverpool 76
3
3 Chelsea 62
4
4 Tottenham Hotspur 57
5
5 Arsenal 57
6
6 Manchester United 54
7
7 West Ham United 52
8
8 Wolverhampton Wanderers 49
9
9 Leicester City 41
10
10 Brighton and Hove Albion 40
11
11 Newcastle United 40
12
12 Brentford 39
13
13 Southampton 39
14
14 Crystal Palace 37
15
15 Aston Villa 36
16
16 Leeds United 33
17
17 Everton 29
18
18 Burnley 28
19
19 Watford 22
20
20 Norwich City 21
21