Skip to content
Advertisement

Tag: beautifulsoup

Parsing JSON web scraper output

I am practicing web scraping using the requests and BeautifulSoup modules on the following website: https://www.imdb.com/title/tt0080684/ My code thus far properly outputs the json in question. I’d like help in extracting from the json only the name and description into a response dictionary. Code Desired Output Answer You can parse the dictonary and then print a new JSON object using

How do I make a crawler extracting information from relative paths?

I am trying to make a simple crawler that extracts links from the “See About” section from this link https://en.wikipedia.org/wiki/Web_scraping. That is 19 links in total, which I have managed to extract using Beautiful Soup. However I get them as relative links in a list, which I also need to fix by making them into absolute links. Intended result would

Looping through pages of search result

I am trying to scrape Reuters image captions on certain pictures. I have searched with my parameters and have a search result with 182 pages. The ‘PN=X’ part at the end of the links are the page numbers. I have built a for loop to loop through the pages and scrape all captions: The code runs, but it returns the

BeautifulSoup trying to remove HTML data from list

As mentioned above, I am trying to remove HTML from the printed output to just get text and my dividing | and -. I get span information as well as others that I would like to remove. As it is part of the program that is a loop, I cannot search for the individual text information of the page as

Advertisement