Skip to content
Advertisement

How do web scrape more underlying data from a websites map location?

Currently, I have successfully used python to scrape data from a competitor’s website to find out store information. The website has a map where you can enter a zip code and it will tell you all the stores in the area of a my current location. The website sends a GET request to pull store data by using this link:

https://www.homedepot.com/StoreSearchServices/v2/storesearch?address=37028&radius=50&pagesize=30

My goal is to scrape all store information not just the imaginary zip code = 12345 & pagesize=30. How should I go about getting all the store information? Would it be better to iterate through a dataset of zip codes to pull all the stores or is there a better way to do this? I’ve tried expanding past 30 page size but it looks like that is the limit on the request.

Advertisement

Answer

This url gives JSON with "currentPage":1 which can means it can use some kind of pagination.

I added &page=2 and it seems it works

Page 1:

https://www.homedepot.com/StoreSearchServices/v2/storesearch?address=37028&radius=250&pagesize=40&page=1

Page 2:

https://www.homedepot.com/StoreSearchServices/v2/storesearch?address=37028&radius=250&pagesize=40&page=2

Page 3:

https://www.homedepot.com/StoreSearchServices/v2/storesearch?address=37028&radius=250&pagesize=40&page=3

For test I use bigger range=250 to get JSON with "recordCount":123

I found that it works also with pagesize=40.
For bigger value it sends JSON with error message.


EDIT:

Minimal working code:

Page blocks request without User-Agent

JavaScript

Result:

JavaScript

If you want to keep as DataFrame then maybe first put all items on list and later convert this list to DataFrame

JavaScript

Because JSON keep address as directory {'postCode': ... , ...} so some columns may have it as directory

JavaScript
JavaScript

See: { } in address, services, storeHours,etc

It may need also to convert it to separated rows.

JavaScript

and concat it with original df

JavaScript

The same way you may do with other columns.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement