Skip to content
Advertisement

Requests Module not fetching full website in Python

Sorry for a Noob question…. I have written a code which searches google for an image stored locally on my computer. I accomplished this using the requests module. I want to scrape the result page for information about the image but request module never fetches the entire page. It only fetches a part of it and thus I am not able to scrape the website for results

JavaScript

The web page looks like this: enter image description here

but when I scrape it for anchor tag links using beautiful soup I get the following result:

JavaScript

The content fetched by requests module doesn’t contain the full web page I don’t know why. I want to scrape information in image ,anchor and h3 tags from the page using beautiful soup but its just not working out.

Advertisement

Answer

The main problem is Python Requests module doesn’t render JavaScript. As a result, you are not getting the webpage you are supposed to get.

You are using a webbrowser module to view your URL where JavaScript is enabled, so you are getting the page as expected. But next, when you use the requests module to get the page, javascript stays disabled, and google doesn’t let you render the page but instead redirects you to another page(Google Homepage). And there, you get different HTML resulting in no search results(you did in the first place).

Showing that final request get redirected

IN 1 is the URL you are trying to hit, and 2 is the URL you are redirected to.
Look at the difference is google.com/webhp?tbs=sbi:AMhZZisX… VS google.com/search?tbs=sbi:AMhZZisX…

The HTML of that page results in is this –

Final OutPut

Always use the source HTML given by the requests module, which shows you the actual result. As you can see, this is not the search result page.

So to reach your goal, try using Selenium.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement