Tag: web-scraping

How to extract deeply nested tags using Beautiful Soup

I have the content below and I am trying to understand how to extract the <p> tag copy using Beautiful Soup (I am open to other methods). As you can see the <p> tags are not both nested inside the same <div>. I gave it a shot with the following method but that only seems to work when both <p>

Unable to fetch tabular content from a site using requests

python python-3.x python-requests web-scraping

I’m trying to fetch tabular content from a webpage using the requests module. After navigating to that webpage, when I manually type 0466425389 right next to Company number and hit the search button, the table is produced accordingly. However, when I mimic the same using requests, I get the following response. I’ve tried with: How can I fetch tabular content

How to scrape the login website using selenium

python selenium web-scraping

I am trying to scrape data but first Click on Language, select English, then click advanced search but they will give me error these is website https://www.counselingcalifornia.com/Find-a-Therapist Answer The reason is iframe. language drop down is in iframe. In Selenium automation, if the webelements are wrapped inside an iframe, we should always switch to iframe first then we can interact

How to parse Google custom search javascript output in python?

python python-3.x python-requests web-scraping

I am trying to fetch some articles from ACL website based on the keywords as input. The website is using google custom search API and the output of the API is a javascript object. How I can parse the returned object in python and fetch the article name, URL, and abstract of the research paper from the object. The script

How to scrape table data that doesnt have different classes?

pandas python web-scraping

Im trying to write some code that will scrape different data from a table on a stock screener website and save the data in excel. The problem I’m having is there isn’t a distinct class code for some of the values I want to pull from the table. so I tried this only for the first header I wanted the

how to deal with pandas read_html gracefully when it fails to find a table?

exception pandas python web-scraping wikipedia

pandas read_html is a great and quick way of parsing tables; however, if it fails to find the table with the attributes specified, it will fail and cause the whole code to fail. I am trying to scrape thousands of web pages, and it is very annoying if it causes an error and termination of the whole program just because

Scraping Crunchbase to extract corporate news

python web-scraping

I’m trying to scrape the news and signals tab from Crunchbase, and having no joy. Having consulted prior threads on Stackoverflow, I have been using this code that has worked well for all other tabs (taking duolingo as an example): I suspect it’s something to do with how Crunchbase has coded-up the news section, which probably requires a tweak to

How to get HTML changes after pressing button with Beautiful Soup and Requests

beautifulsoup python request web-scraping

I want to get the HTML this site https://www.forebet.com/en/football-predictions after pressing the button More[+] enough times to load all games. Each time the button More[+] on the bottom of the page the HTML changes and shows more football games. How do I get the request to the page with all the football games loaded? Answer Like stated, requests and beautfulsoup

Unable to send requests in the right way after replacing redirected url with original one using middleware

middleware python python-3.x scrapy web-scraping

I’ve created a script using scrapy to fetch some fields from a webpage. The url of the landing page and the urls of inner pages get redirected very often, so I created a middleware to handle that redirection. However, when I came across this post, I could understand that I need to return request in process_request() after replacing the redirected

I am very new to scraping please bear with me and this is my 1st project. I am trying to scrape a site using selenium

python selenium web-scraping

website I’m scraping https://www.telekom.de/unterwegs/apple/apple-iphone-13-pro/graphit-512gb label image I was not able to print the radio buttons label according to checked button. I don’t know what is the mistake and where I did it. could anyone help on this. It will be helpful for me to learn. Change tariff links given below links, Answer You are trying to find element within an