Tag: beautifulsoup

Parsing JSON web scraper output

I am practicing web scraping using the requests and BeautifulSoup modules on the following website: https://www.imdb.com/title/tt0080684/ My code thus far properly outputs the json in question. I’d like help in extracting from the json only the name and description into a response dictionary. Code Desired Output Answer You can parse the dictonary and then print a new JSON object using

How do I make a crawler extracting information from relative paths?

beautifulsoup python

I am trying to make a simple crawler that extracts links from the “See About” section from this link https://en.wikipedia.org/wiki/Web_scraping. That is 19 links in total, which I have managed to extract using Beautiful Soup. However I get them as relative links in a list, which I also need to fix by making them into absolute links. Intended result would

printing same name, price,and link in BeautifulSoup python

beautifulsoup python python-requests selenium

How to Get all Product detail it prints the same things but I want others products to detail also here is the link from where I want to fetch the data of all product:https://www.nike.com/gb/w/womens-lifestyle-shoes-13jrmz5e1x6zy7ok Answer What happens? Their is a wrong indent with your print Their is only one element with class of product-grid How to fix? Check the indent

BeautifulSoup extract conditioned digit coloured by css

beautifulsoup css python web-scraping

I successfully get the data from this table from THRIVEN : But as you can see, at the Net% column, those values negative/positive are determined by some CSS (which I believed, and I couldn’t find them where they are located). How can I extract those data and put them into my Excel as negative/positive numbers? Below is my current code

Looping through pages of search result

beautifulsoup python

I am trying to scrape Reuters image captions on certain pictures. I have searched with my parameters and have a search result with 182 pages. The ‘PN=X’ part at the end of the links are the page numbers. I have built a for loop to loop through the pages and scrape all captions: The code runs, but it returns the

Extract data from Json: Error JSONDecodeError: Expecting value

beautifulsoup json python python-requests web-scraping

Error : File “C:UsersAdminanaconda3libjsondecoder.py”, line 355, in raw_decode raise JSONDecodeError(“Expecting value”, s, err.value) from None JSONDecodeError: Expecting value Answer This is how you do it: Output:

convert website table to pandas df (beautifulsoup doesn’t recognize table)

beautifulsoup pandas python

I want to convert a website table to pandas df, but BeautifulSoup doesn’t recognize the table (snipped image below). Below is the code I tried with no luck. I also tried the code below with no luck Answer Your table is not in the <table> tag but in multiple <span> tags. You can parse these to a dataframe like so:

Export results to excel file title and link requests python [closed]

beautifulsoup csv excel python python-requests

Closed. This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 2 years ago. Improve this question I am training on how to scrape some data in python and here’s my try: The code gets the links

BeautifulSoup trying to remove HTML data from list

beautifulsoup html python

As mentioned above, I am trying to remove HTML from the printed output to just get text and my dividing | and -. I get span information as well as others that I would like to remove. As it is part of the program that is a loop, I cannot search for the individual text information of the page as

How to extract element from a webpage with special class name?

beautifulsoup python web-scraping

I have a txt file filed with multiple urls, each url is an article with text and their corresponding SDG (example of one article 1) The text parts of an article are in balises ‘div.text.-normal.content’ and then in ‘p’ And the SDGs are in ‘div.tax-section.text.-normal.small’ and then in ‘span’ To extract them I use the following lines of code :