Tag: web-scraping

webscraping an image with highlighted text

I am doing web scraping on this URL which is a newspaper image with highlighted words. My purpose is to retrieve all those highlighted words in red. Inspecting the page gives the class: image-overlay hit-rect ng-star-inserted in which attribute title must be extract: Using the following code snippet with BeautifulSoup: However, I get [] as a result! My expected result

WebScraping: Pandas to_excel Not Displaying full DataFrame

dataframe jupyter pandas python web-scraping

I am brand new to coding, and was given a web scraping tutorial (found here) to help build my skills as I learn. I’ve already had to make several adjustments to the code in this tutorial, but I digress. I’m scraping off of http://books.toscrape.com/ and, when I try to export a Dataframe of just the book categories into Excel, I

Downloading all zip files from url

beautifulsoup python web-scraping

I need to download all the zip files from the url: https://www.ercot.com/mp/data-products/data-product-details?id=NP7-802-M The zip files are as shown in the pic: I am trying the following code: I have tried different versions of above but no success so far. I am not sure how to proceed. Answer Everything you need comes from one endpoint that you can query and then

How to import empty values to csv if not found? [Python, Scrapy, Web Scrapping]

python scrapy web-scraping

I am writting my first web scrapping project and I want to scrap from booking.com. I’d like to scrap info about include breakfast in hotel. The problem is – I want every value to be [“Brekafast included”] or empty value [“”] if there is no info about it. If Im runnig my code (below) I only get few values [“Brekafast

Finding <p style class using BeautifulSoup

beautifulsoup finance python web-scraping

I am trying to scrape MSFT’s income statement using code I found here: How to Web scraping SEC Edgar 10-K Dynamic data They use the ‘span’ class to narrow the search. I do not see a span, so I am trying to use the <p class with no luck. Here is my code, it is largely unchanged from the answer

How to extract RSS links from website with Python

beautifulsoup python rss web-scraping

I am trying to extract all RSS feed links from some websites. Ofc if RSS exists. These are some website links that have RSS, and below is list of RSS links from those websites. My approach is to extract all links, and then check if some of them has RSS in them, but that is just a first step: I

How to scrape table with flight data, avoiding an empty result?

beautifulsoup python python-requests selenium web-scraping

I’m trying to extract a table from a webpage and have tried a number of alternatives, but the table always seems to remain empty. Two of what I thought were the most promising sets of code are attached below. Any means of extracting the data from the webpage would be considered as helpful. I have also included a screenshot of

Convert epoch to human-readable date python

python python-3.x selenium web-scraping

I want to convert from epoch to date like that : 1575135888 ==> 3 years ago (https://i.stack.imgur.com/x2ZPW.png) I have this code : I got 2022-12-09 and 2019-11-30 18:44:48 I don’t know how to convert it to 3 years ago. Answer Well, you could use timedelta like this: Output: But I’m afraid that the meaning of “years ago” is ambiguous: Does

Beautifulsoup/Writer returns an empty cell when exported to a CSV

beautifulsoup csvwriter python web-scraping

I’m scraping a website to get the name, birth + death dates, and the name of the cemetery someone is buried in. For the most part, it is working quite well; however, when I exported the text to a CSV, I noticed that there’s a blank cell inserted in the name column after each page. I have a feeling this

Get Text from SVG using Python Selenium

python selenium svg web-scraping

My first time trying to extract data from an SVG element, following is the SVG element and the code I have tried to put up by reading stuff on the internet, I have absolutely no clue how wrong I am and why so. I am trying to get the Categories and corresponding Percentages from the last 2 blocks of the