Python – Iterate through list of website and scrape data – failing at requests.get

Question

I have a list of items that I scraped from Github. This is sitting in df_actionname [&#8216;ActionName&#8217;]. Each [&#8216;ActionName&#8217;] can then be converted into a [&#8216;Weblink&#8217;] to create a website link. I am trying to loop through each weblink and scrape data from it. My code: My code is f…

Accepted Answer

You need to set a single valid url. Changing your for loop to# from bs4 import BeautifulSoup for website in df_actionname['Weblink']:  detailpage = requests.get(website)  pageSoup = BeautifulSoup(detailpage.content, 'html.parser')  print(f'scraped "{pageSoup.title.text}" from {website}')gives me the outputscraped "TruffleHog OSS · Actions · GitHub Marketplace · GitHub" from https://github.com/marketplace/actions/TruffleHog-OSSscraped "Metrics embed · Actions · GitHub Marketplace · GitHub" from https://github.com/marketplace/actions/Metrics-embedscraped "Super-Linter · Actions · GitHub Marketplace · GitHub" from https://github.com/marketplace/actions/Super-LinterThe way you were doing it, not only was your code basically trying to repeatedly sending the same GET request every loop (since URL was not dependent on website at all), the input of requests.get was not a single url, as you can see if you add a print before the request:

Advertisement

Answer