Skip to content
Advertisement

Beautiful Soup: extract 2 different tags and append them together in text only output

I have been playing around with beautiful soup trying to learn it. So far ive learned some stuff but im struggle to put my use case together. how do i print, both movieslist and moviescore text only appended together? appreciate the help and info. really enjoying python and some of its applications like web scraping.

import requests
from bs4 import BeautifulSoup
result = requests.get("https://www.rottentomatoes.com/browse/opening")
print("Checking Website")
print(result.status_code)
print("Gathering Website data and preparing it for presentation")
src = result.content
soup = BeautifulSoup(src, 'lxml')
movielist = soup.find_all("div",attrs={"class":"media-list__title"})
moviescore = soup.find_all("span",attrs={"class":"tMeterScore"})
for movielist in soup.find_all("div",attrs={"class":"media-list__title"}):
        print (movielist.text)

Advertisement

Answer

The key here is to “zip” the two lists you have. But before this happens you need to get the text value from each element and strip it.

Here’s a slight modification of your code:

import requests
from bs4 import BeautifulSoup


result = requests.get("https://www.rottentomatoes.com/browse/opening")

print("Checking Website")
print(result.status_code)
print("Gathering Website data and preparing it for presentation")

soup = BeautifulSoup(result.content, 'lxml')

# get each movie title and remove any whitespace characters
movies = [
    title.getText(strip=True) for title in
    soup.find_all("div", attrs={"class": "media-list__title"})
]
# get each movie score, remove any whitespace chars, and replace '- -'
# with a custom message -> No score yet. :(
movie_scores = [
    score.getText(strip=True).replace("- -", "No score yet. :(") for score
    in soup.select(".media-list__meter-container")  # introducing css selectors :)
]

for movie_data in zip(movies, movie_scores):  # zipping the two lists
    title, score = movie_data  # this outputs a tuple: (MOVIE_TITLE, MOVIE_SCORE)
    print(f"{title}: {score}")

Output:

Checking Website
200
Gathering Website data and preparing it for presentation
The Courier: 79%
The Heiress: No score yet. :(
The Stay: No score yet. :(
City of Lies: 50%
Happily: 70%
Doors: No score yet. :(
Last Call: No score yet. :(
Enforcement: 100%
Phobias: No score yet. :(
Dark State: No score yet. :(
Food Club: 83%
Wojnarowicz: 100%
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement