Skip to content
Advertisement

How to scrape billboard using find_all?

I have a pretty specific question if anyone could help that would be awesome.

Screenshot of billboard100 webpage

I am trying to scrape the songs from https://www.billboard.com/charts/hot-100/ and I am stuck on this code.

What should I put for tag = soup.find_all('???') to get the title ‘About Damn Time.

import requests
import bs4

url = 'https://www.billboard.com/charts/hot-100/'

song_title = []


res = requests.get(url)
soup = bs4.BeautifulSoup(res.text, 'lxml')
tag = soup.find_all('???')
print(tag)

Advertisement

Answer

Try to select the row containers, iterate the ResultSet get your expected text from the <h3>:

for e in soup.find_all(attrs={'class':'o-chart-results-list-row-container'}):
    print(e.h3.get_text(strip=True))

Example

If you try to store different information, avoid a bunch of differnet lists. Instead use one list with dict of each item and its information.

import requests
import bs4

url = 'https://www.billboard.com/charts/hot-100/'
res = requests.get(url)
soup = bs4.BeautifulSoup(res.text)

data = []

for e in soup.find_all(attrs={'class':'o-chart-results-list-row-container'}):
    data.append({
        'title':e.h3.get_text(strip=True),
        'author':e.h3.find_next('span').get_text(strip=True)
    })

data
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement