Tag: web-scraping

What is contents in beautifulsoup4 and the number string?

I’m trying a web scraping in bs4 and I don’t know what it is, Pls Someone explain it to me tnx Answer The contents attribute holds a list of child elements of the element. The .string attribute of an element contains the text content for the element. Using this page as an example: output for elem.contents output for elem.contents[1].string

Beautifulsoup how to extract paragraph from this page perfectly? only paragraph

beautifulsoup html-parsing python web-scraping

I am unable to get the text inside the p tags i want text of all the p tags, have tried this so far but unable to exact text. i am getting many p tags within my code how to remove those tags ? here is my output Answer Output:

Separate list obtained with selenium Python

python python-3.x selenium selenium-chromedriver web-scraping

I am developing a script to scrape a page of hotels.com with selenium It extracts it as a list I am trying to scrape the Amenities part but I get the list all together, how could I separate it in a way that it would be separated like this? Tamaño del hotel: 331 habitaciones, Cuenta con 8 pisos Entrada y

How to find the attribute and element id by selenium.webdriver?

css-selectors python selenium web-scraping webdriver

I am learning web scrapping since I need it for my work. I wrote the following code: However, it is showing the following error: Then I inspect the table that I wanna scrape this table from this page what is the attribute that needs to be included in get_attribute() function in the following line? what I should write in the

XMLFeedSpider not Producing an Output CSV

python scrapy web-scraping

Having an issue with XMLFeedSpider. I can get the parsing to work on the scrapy shell, so it seems there is something going on with either the request, or the spider’s engagement. Whether I add a start_request() method or not, I seem to get the same error. No output_file.csv is produced after running the spider. I am able to get

How to click on a button on a webpage and iterate through contents after clicking on button using python selenium

onclick python selenium selenium-webdriver web-scraping

I am using Python Selenium to web scrape from https://finance.yahoo.com/quote/AAPL/balance-sheet?p=AAPL but I want to scrape the Quarterly data instead of the Annual after clicking on the “Quarterly” button on the top right. This is my code so far: I am able to get the button to click but when I iterate through the divs, I am still getting content that

Using Split method or Regex to separate string

python regex split string web-scraping

In my project I am webscrapping UFC website to gather to the total wins, total losses, and total draws of each UFC athlete. This is part of my code, as I wish to strip the total wins, total losses, and total draws separately: The result is the following: The problem is, I am unable to spew out the total draws.

SQL optimization to increase batch insert using Scrapy

mysql python scrapy web-scraping

In my previous post, I asked how I can record items in bulk using scrapy. The topic is here: Buffered items and bulk insert to Mysql using scrapy With the help of @Alexander, I can keep 1000 items in cache. However, my problem here is that the items in the cache are recording one by one while they are being

XHR Request Preview Shows Data That Isnt Present In Response

python scrapy web-scraping

I am trying to use scrappy to grab some data off of a public website. Thankfully the data mostly can be found in an xhr request here: But when I double click to see the actual response there is no data in the search_results item: I am just wondering what is going on with the request, how can I access

auto refresh page while element is not clickable (python)

python selenium web-scraping

Objective: Find an appointment Code: While the code works, I would like to extend it by making it do auto-refresh every 5 second interval until the element on 9 September 2022 is clickable. I am thinking of something like but the second part of the code does not work. An example of a clickable date is on Nov 4. Update: