I am trying to make a news scraper with BS4 and I am able to get the html code from the website (cnn) and this is my code:
from bs4 import BeautifulSoup import requests url = "https://www.cnn.com/" topic = input("What kind of news are you looking for") result = requests.get(url) doc = BeautifulSoup(result.text, "html.parser") prices = doc.find_all(text = f"{topic}") parent = prices[0].parent print(parent)
but its giving me this error
xxx@xxx-xxx xxx % python3 news_scraper.py What kind of news are you looking for?Coronavirus Traceback (most recent call last): File "/xxx/xxx/xxx/xxx/news_scraper.py", line 10, in <module> parent = prices[0].parent IndexError: list index out of range
I have no idea what is causing this, Thanks!
Advertisement
Answer
If the string topic
is not found on the page, then prices
will be an empty array. To fix this, first check that the length of prices
is not zero. Like this:
from bs4 import BeautifulSoup import requests url = "https://www.cnn.com/" topic = input("What kind of news are you looking for") result = requests.get(url) doc = BeautifulSoup(result.text, "html.parser") prices = doc.find_all(text = f"{topic}") if len(prices) != 0: parent = prices[0].parent print(parent) else: print("No news of that topic was found.");