Skip to content
Advertisement

Tag: web-scraping

Python Scrapy -> Use a scrapy spider as a function

so I have the following Scrapy Spider in spiders.py But the key aspect is that I want to call this spider as a function, in another file, instead of using scrapy crawl quotes in the console. Where can I read more on this, or whether this is possible at all? I checked through the Scrapy documentation, but I didn’t find

How to convert Web PDF to Text

I want to convert web PDF’s such as – https://archives.nseindia.com/corporate/ICRA_26012022091856_BSER3026012022.pdf & many more into a Text without saving them into my PC ,Cause 1000’s of such announcemennts come up daily , Hence wanted to convert them to text without saving them on my PC. Any Python Code Solutions to this? Thanks Answer There is different methods to do this. But

TypeError: ‘<' not supported between instances of 'str' and 'int' after converting string to float

Using: Python in Google Collab Thanks in Advance: I have run this code on other data I have scraped FBREF, so I am unsure why it’s happening now. The only difference is the way I scraped it. The first time I scraped it: url_link = ‘https://fbref.com/en/comps/Big5/gca/players/Big-5-European-Leagues-Stats’ The second time I scraped it: url = ‘https://fbref.com/en/comps/22/stats/Major-League-Soccer-Stats’ html_content = requests.get(url).text.replace(‘<!–‘, ”).replace(‘–>’, ”)

Error ‘Unexpected HTTP code on the target page’, ‘status_code’: 403 when I try to request a json url with a proxy api

I’m trying to scrap this website https://triller.co/ , so I want to get information from profile pages like this https://triller.co/@warnermusicarg , what I do is trying to request the json url that contains the information, in this case it’s https://social.triller.co/v1.5/api/users/by_username/warnermusicarg When I use requests.get() it works normally and I can retrieve all the information. The problem arises when I try

Advertisement