I’m working on a web-scraping task and I can already collect the data in a very rudimentary way.
Basically, I need a function to collect a list of songs and artists from the Allmusic.com and then add the data in df. In this example, I use this link: https://www.allmusic.com/mood/tender-xa0000001119/songs
So far, I managed to accomplish most of the objective, however, I had to perform two different functions (def get_song() and def get_performer()).
I would like, if possible, an alternative to join these two functions.
The codes used are below:
JavaScript
x
37
37
1
import requests
2
from bs4 import BeautifulSoup
3
4
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux i586; rv:31.0) Gecko/20100101 Firefox/31.0'}
5
link = "https://www.allmusic.com/mood/tender-xa0000001119/songs"
6
7
8
# Function to collect songs (title)
9
songs = []
10
11
def get_song():
12
url = link
13
source_code = requests.get(url, headers=headers)
14
plain_text = source_code.text
15
soup = BeautifulSoup(plain_text)
16
for td in soup.findAll('td', {'class': 'title'}):
17
for a in td.findAll('a')[0]:
18
song = a.string
19
songs.append(song)
20
21
# Function to collect performers
22
performers = []
23
24
def get_performer():
25
url = link
26
source_code = requests.get(url, headers=headers)
27
plain_text = source_code.text
28
soup = BeautifulSoup(plain_text)
29
for td in soup.findAll('td', {'class': 'performer'}):
30
for a in td.findAll('a'):
31
performer = a.string
32
performers.append(performer)
33
34
get_song(), get_performer() # Here, I call the two functions, but the goal, if possible, is to use one function.
35
36
df = pd.DataFrame(list(zip(songs,performers)), columns=['song', 'performer']) # df creation
37
Advertisement
Answer
To get titles/performer you can use next example:
JavaScript
1
21
21
1
import requests
2
import pandas as pd
3
from bs4 import BeautifulSoup
4
5
url = "https://www.allmusic.com/mood/tender-xa0000001119/songs"
6
headers = {
7
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
8
}
9
10
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
11
12
all_data = []
13
for td in soup.select("td.title"):
14
title = td.get_text(strip=True)
15
performer = td.find_next("td").get_text(strip=True)
16
all_data.append((title, performer))
17
18
df = pd.DataFrame(all_data, columns=["title", "performer"])
19
print(df)
20
df.to_csv("data.csv", index=False)
21
Prints:
JavaScript
1
40
40
1
title performer
2
0 Knock You Down Keri Hilson
3
1 Down Among the Wine and Spirits Elvis Costello
4
2 I Felt The Chill Elvis Costello
5
3 She Handed Me A Mirror Elvis Costello
6
4 I Dreamed Of My Old Lover Elvis Costello
7
5 She Was No Good Elvis Costello
8
6 The Crooked Line Elvis Costello
9
7 Changing Partners Elvis Costello
10
8 Small Town Southern Man Alan Jackson
11
9 Find Your Love Drake
12
10 Today Was a Fairytale Taylor Swift
13
11 Need You Now Lady A
14
12 American Honey Lady A
15
13 Peace Dream Ringo Starr
16
14 If I Died Today Tim McGraw
17
15 Still Tim McGraw
18
16 I Need Love Ledisi
19
17 Uhh Ahh Boyz II Men
20
18 Shattered Heart Brandy
21
19 Right Here (Departed) Brandy
22
20 Warm It Up (With Love) Brandy
23
21 If I Were a Boy Beyoncé
24
22 Why Does She Stay Ne-Yo
25
23 Daddy Needs a Drink Drive-By Truckers
26
24 Think About You Ringo Starr
27
25 Liverpool 8 Ringo Starr
28
26 Nefertiti Herbie Hancock
29
27 River Herbie Hancock/Corinne Bailey Rae
30
28 Both Sides Now Herbie Hancock
31
29 Court and Spark Herbie Hancock/Norah Jones
32
30 I Taught Myself How to Grow Old Ryan Adams
33
31 Ghetto Kelly Rowland/Snoop Dogg
34
32 Little Girl Enrique Iglesias
35
33 The Magdalene Laundries Emmylou Harris
36
34 Because of You Ne-Yo
37
35 We Belong Together Mariah Carey
38
36 Thank You for Loving Me Bon Jovi
39
37 He's Younger Than You Are Sonny Rollins
40
and saves data.csv
(screenshot from LibreOffice):