Skip to content

Tag: web-crawler

scrapy/regex get json_object from html

I’m crawling reviews from a website in scrapy python and want to get all the reviews from the following part of the raw html as a dictionary. Getting the window.cj.listings is no problem, but I can’t seem to get the window.cj.app_data out with regex. The following code works for getting the listing. But I get nothing from window.cj.app_data, when I

Crawling IMDB for movie trailers?

I want to crawl IMDB and download the trailers of movies (either from YouTube or IMDB) that fit some criteria (e.g.: released this year, with a rating above 2). I want to do this in Python – I saw that there were packages for crawling IMDB and downloading YouTube videos. The thing is, my current plan is to crawl IMDB
