I’m working on a web-scraping task and I can already collect the data in a very rudimentary way.
Basically, I need a function to collect a list of songs and artists from the Allmusic.com and then add the data in df. In this example, I use this link: https://www.allmusic.com/mood/tender-xa0000001119/songs
So far, I managed to accomplish most of the objective, however, I had to perform two different functions (def get_song() and def get_performer()).
I would like, if possible, an alternative to join these two functions.
The codes used are below:
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux i586; rv:31.0) Gecko/20100101 Firefox/31.0'}
link = "https://www.allmusic.com/mood/tender-xa0000001119/songs"
# Function to collect songs (title)
songs = []
def get_song():
url = link
source_code = requests.get(url, headers=headers)
plain_text = source_code.text
soup = BeautifulSoup(plain_text)
for td in soup.findAll('td', {'class': 'title'}):
for a in td.findAll('a')[0]:
song = a.string
songs.append(song)
# Function to collect performers
performers = []
def get_performer():
url = link
source_code = requests.get(url, headers=headers)
plain_text = source_code.text
soup = BeautifulSoup(plain_text)
for td in soup.findAll('td', {'class': 'performer'}):
for a in td.findAll('a'):
performer = a.string
performers.append(performer)
get_song(), get_performer() # Here, I call the two functions, but the goal, if possible, is to use one function.
df = pd.DataFrame(list(zip(songs,performers)), columns=['song', 'performer']) # df creation
Advertisement
Answer
To get titles/performer you can use next example:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = "https://www.allmusic.com/mood/tender-xa0000001119/songs"
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
}
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
all_data = []
for td in soup.select("td.title"):
title = td.get_text(strip=True)
performer = td.find_next("td").get_text(strip=True)
all_data.append((title, performer))
df = pd.DataFrame(all_data, columns=["title", "performer"])
print(df)
df.to_csv("data.csv", index=False)
Prints:
title performer 0 Knock You Down Keri Hilson 1 Down Among the Wine and Spirits Elvis Costello 2 I Felt The Chill Elvis Costello 3 She Handed Me A Mirror Elvis Costello 4 I Dreamed Of My Old Lover Elvis Costello 5 She Was No Good Elvis Costello 6 The Crooked Line Elvis Costello 7 Changing Partners Elvis Costello 8 Small Town Southern Man Alan Jackson 9 Find Your Love Drake 10 Today Was a Fairytale Taylor Swift 11 Need You Now Lady A 12 American Honey Lady A 13 Peace Dream Ringo Starr 14 If I Died Today Tim McGraw 15 Still Tim McGraw 16 I Need Love Ledisi 17 Uhh Ahh Boyz II Men 18 Shattered Heart Brandy 19 Right Here (Departed) Brandy 20 Warm It Up (With Love) Brandy 21 If I Were a Boy Beyoncé 22 Why Does She Stay Ne-Yo 23 Daddy Needs a Drink Drive-By Truckers 24 Think About You Ringo Starr 25 Liverpool 8 Ringo Starr 26 Nefertiti Herbie Hancock 27 River Herbie Hancock/Corinne Bailey Rae 28 Both Sides Now Herbie Hancock 29 Court and Spark Herbie Hancock/Norah Jones 30 I Taught Myself How to Grow Old Ryan Adams 31 Ghetto Kelly Rowland/Snoop Dogg 32 Little Girl Enrique Iglesias 33 The Magdalene Laundries Emmylou Harris 34 Because of You Ne-Yo 35 We Belong Together Mariah Carey 36 Thank You for Loving Me Bon Jovi 37 He's Younger Than You Are Sonny Rollins
and saves data.csv (screenshot from LibreOffice):
