Tag: scrapy

How to troubleshoot Scrapy shell response 403 error

cookies python response scrapy web-scraping

A few months ago I followed this Scrapy shell method to scrape a real estate listings webpage and it worked perfectly. I pulled my cookie and user-agent text from Firefox (Developer tools -> Headers) when the target URL is loaded, and I would get a successful response (200) and be able to pull items from r…

When using python script to run scrapy crawler, data is scraped successfully but the output file shows no data in it and is of 0 kb

python scrapy

#Scrapy News Crawler #defining function to set headers and setting Link from where to start scraping #Iterating headline links and getting healine details and date/time #Python script (Separate FIle ) Answer Instead of running you spider with cmdline.execute you can run it with CrawlerProcess, read about comm…

Remove unnecessary url from scrapy

python scrapy web-scraping

I want to remove these unnecessary url from the link the website is https://www.ifep.ro/justice/lawyers/lawyerspanel.aspx Answer You can apply endswith method along with continue keyword to remove the desired urls Output:

Scrapy get only text ignoring the commented content

python scrapy xpath

I researched but can’t find any answers to my question: I want get the main content, ignoring the commented content, how should I do? my scrapy spider looks like: But this codes give me only some nt. plz help, thank you. Answer When /text() in XPath or ::text in CSS fails to produce the desired result, …

Scrapy : Crawled 0 pages (at 0 pages/min), scraped 0 items

python response scrapy

I’m new to python and I’m trying to scrape a html with a scrapy spider but the response returns nothing. Wondering what’s wrong here? Thanks for any help in advance. The url: https://directory.lubesngreases.com/LngMain/includes/themes/MuraBootstrap3/remote/api/?fn=searchcompany&name&…

Could this selenium code be recreated using scrapy?

python python-3.x scrapy

I’m interested in getting a better idea of what scrapy can do. Here is a very simple selenium code that interacts with a website, fills in some boxes, clicks some elements and downloads a file. Could this code be replicated using scrapy?, so that a code is written using scrapy that does the exact same t…

json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) Scrapy

beautifulsoup python scrapy web-crawler web-scraping

Hi guys I am trying to scrap/crawl this json based site using scrapy/Beautifulsoup https://pk.profdir.com/jobs-for-angular-developer-lahore-punjab-cddb I have write this below code to run read/fetch the json from website: But it will arise this error again and again: If anyone knows please help me it will be …

How to locate a changing element in playwright? [closed]

playwright python scrapy

Closed. This question needs details or clarity. It is not currently accepting answers. Want to improve this question? Add details and clarify the problem by editing this post. Closed 11 months ago. Improve this question I am filling a input-box with verification code, but the text which can locate the input-b…

Trying to split text from title

python scrapy web-scraping

I want to remove these from my output: I want these only Wave Coffee Collection This is my code : Answer If this is your resulting output: Then you can easily achieve your desired output like this:

Run scrapy splash as a script

python scrapy scrapy-splash

I am trying to run a scrapy script with splash, as I want to scrape a javascript based webpage, but with no results. When I execute this script with python command, I get this error: crochet._eventloop.TimeoutError. In addition the print statement in parse method never printed, so I consider something is wron…