I’m working on a web-scraping task and I can already collect the data in a very rudimentary way.
Basically, I need a function to collect a list of songs and artists from the Allmusic.com and then add the data in df. In this example, I use this link: https://www.allmusic.com/mood/tender-xa0000001119/songs
So far, I managed to accomplish most of the objective, however, I had to perform two different functions (def get_song() and def get_performer()).
I would like, if possible, an alternative to join these two functions.
The codes used are below:
import requests from bs4 import BeautifulSoup headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux i586; rv:31.0) Gecko/20100101 Firefox/31.0'} link = "https://www.allmusic.com/mood/tender-xa0000001119/songs" # Function to collect songs (title) songs = [] def get_song(): url = link source_code = requests.get(url, headers=headers) plain_text = source_code.text soup = BeautifulSoup(plain_text) for td in soup.findAll('td', {'class': 'title'}): for a in td.findAll('a')[0]: song = a.string songs.append(song) # Function to collect performers performers = [] def get_performer(): url = link source_code = requests.get(url, headers=headers) plain_text = source_code.text soup = BeautifulSoup(plain_text) for td in soup.findAll('td', {'class': 'performer'}): for a in td.findAll('a'): performer = a.string performers.append(performer) get_song(), get_performer() # Here, I call the two functions, but the goal, if possible, is to use one function. df = pd.DataFrame(list(zip(songs,performers)), columns=['song', 'performer']) # df creation
Advertisement
Answer
To get titles/performer you can use next example:
import requests import pandas as pd from bs4 import BeautifulSoup url = "https://www.allmusic.com/mood/tender-xa0000001119/songs" headers = { "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0" } soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser") all_data = [] for td in soup.select("td.title"): title = td.get_text(strip=True) performer = td.find_next("td").get_text(strip=True) all_data.append((title, performer)) df = pd.DataFrame(all_data, columns=["title", "performer"]) print(df) df.to_csv("data.csv", index=False)
Prints:
title performer 0 Knock You Down Keri Hilson 1 Down Among the Wine and Spirits Elvis Costello 2 I Felt The Chill Elvis Costello 3 She Handed Me A Mirror Elvis Costello 4 I Dreamed Of My Old Lover Elvis Costello 5 She Was No Good Elvis Costello 6 The Crooked Line Elvis Costello 7 Changing Partners Elvis Costello 8 Small Town Southern Man Alan Jackson 9 Find Your Love Drake 10 Today Was a Fairytale Taylor Swift 11 Need You Now Lady A 12 American Honey Lady A 13 Peace Dream Ringo Starr 14 If I Died Today Tim McGraw 15 Still Tim McGraw 16 I Need Love Ledisi 17 Uhh Ahh Boyz II Men 18 Shattered Heart Brandy 19 Right Here (Departed) Brandy 20 Warm It Up (With Love) Brandy 21 If I Were a Boy Beyoncé 22 Why Does She Stay Ne-Yo 23 Daddy Needs a Drink Drive-By Truckers 24 Think About You Ringo Starr 25 Liverpool 8 Ringo Starr 26 Nefertiti Herbie Hancock 27 River Herbie Hancock/Corinne Bailey Rae 28 Both Sides Now Herbie Hancock 29 Court and Spark Herbie Hancock/Norah Jones 30 I Taught Myself How to Grow Old Ryan Adams 31 Ghetto Kelly Rowland/Snoop Dogg 32 Little Girl Enrique Iglesias 33 The Magdalene Laundries Emmylou Harris 34 Because of You Ne-Yo 35 We Belong Together Mariah Carey 36 Thank You for Loving Me Bon Jovi 37 He's Younger Than You Are Sonny Rollins
and saves data.csv
(screenshot from LibreOffice):