Tag: web-scraping

Selenium Webdriver module: Keys.UP does not work as expected

python selenium selenium-webdriver web-scraping

I have written a python script that should automatically play the game 2048 (https://play2048.co/). The problem is: keystrokes seem to be ignored by the browser. Or the program runs too fast for having the browser clicking through the game. I have checked the Selenium documentation and I am not sure if I have to include some explicit waits. Here is

Python – Selenium – Scraping through multiple websites

python selenium web-scraping

I am trying to build a webscraper with python / selenium that scrapes data from multiple websites and stores the data in an Excel sheet. The sites I want to scrape are the following: From all sites I want to scrape the “Omsättning”, “Volym” and “VWAP” values and store them in an excel sheet. This is what I got so

I am trying to webscrape from Zomato, however it returns with an output of “None” and Attribute Error

beautifulsoup python web-scraping

Whenever i try to extract the data, it returns an output of “None” which I am not sure of is it the code (I followed the rules of using bs4) or is it just the website that’s different to scrape? My code: Here is the inspected tag of the website which i try to get the h4 class showing the

Python Scrapy -> Use a scrapy spider as a function

python scrapy web-scraping

so I have the following Scrapy Spider in spiders.py But the key aspect is that I want to call this spider as a function, in another file, instead of using scrapy crawl quotes in the console. Where can I read more on this, or whether this is possible at all? I checked through the Scrapy documentation, but I didn’t find

How to convert Web PDF to Text

html pdf pdftotext python web-scraping

I want to convert web PDF’s such as – https://archives.nseindia.com/corporate/ICRA_26012022091856_BSER3026012022.pdf & many more into a Text without saving them into my PC ,Cause 1000’s of such announcemennts come up daily , Hence wanted to convert them to text without saving them on my PC. Any Python Code Solutions to this? Thanks Answer There is different methods to do this. But

Downloading images using src in python produces empty images

python selenium web-scraping

My script is kind of working but the files it saves are empty. Any ideas? Forgive me for all the unused import at the top! I tried a lot of different things to do this. In here I’m pulling the img using selenium. The SRCs are then iterated through a loop and transformed into bytes so that they can be

TypeError: ‘<' not supported between instances of 'str' and 'int' after converting string to float

python web-scraping

Using: Python in Google Collab Thanks in Advance: I have run this code on other data I have scraped FBREF, so I am unsure why it’s happening now. The only difference is the way I scraped it. The first time I scraped it: url_link = ‘https://fbref.com/en/comps/Big5/gca/players/Big-5-European-Leagues-Stats’ The second time I scraped it: url = ‘https://fbref.com/en/comps/22/stats/Major-League-Soccer-Stats’ html_content = requests.get(url).text.replace(‘<!–‘, ”).replace(‘–>’, ”)

How do I filter HTML elements in Python

beautifulsoup filter html python web-scraping

Ive got a list of strings by scraping a website. I want the code to print the HTML elements from that list IF they contain “L” in them. Ive managed to write a code that works just fine on “normal list” that I manually just write into the code (example 1 below) but as soon as I try using that

how to remove unwanted text from retrieving title of a page using python

beautifulsoup python scrapy web-scraping

Hi All I have written a python program to retrieve the title of a page it works fine but with some pages, it also receives some unwanted text how to avoid that here is my program here is my output instead of this I suppose to receive only this line please help me with some idea all other websites are

Error ‘Unexpected HTTP code on the target page’, ‘status_code’: 403 when I try to request a json url with a proxy api

python python-requests web-scraping

I’m trying to scrap this website https://triller.co/ , so I want to get information from profile pages like this https://triller.co/@warnermusicarg , what I do is trying to request the json url that contains the information, in this case it’s https://social.triller.co/v1.5/api/users/by_username/warnermusicarg When I use requests.get() it works normally and I can retrieve all the information. The problem arises when I try