There is an API that only produces one hundred results per page. I am trying to make a while loop so that it goes through all pages and takes results from all pages, but it does not work properly.
This script goes through the pages:
params = dict( order_by='salary_desc', text=keyword, area=area, period=30, # days per_page=100, page = 0, no_magic='false', # disable magic search_field='name' # available: name, description, company_name ) pages = [] while True: params["page"] += 1 response = requests.get(BASE_URL + '/vacancies', headers={'User-Agent': generate_user_agent()}, params=params,) items = response.json()['items'] if not items: break pages.append(items) # Do it for each page response
At startup:
params
{'area': 1, 'no_magic': 'false', 'order_by': 'salary_desc', 'page': 5, 'per_page': 100, 'period': 30, 'search_field': 'name', 'text': '"python"'}
He sees five pages.
When I look at the variable after execution:
len(pages) 4
He only sees four pages.
If I understood correctly, he does not see the zero page (pages in the api start at zero).
Please tell me how you can fix this error?
Complete script in colab at this link https://colab.research.google.com/drive/14KddVLTyH3LkcE-LmHm7EooTYMM7b0zB?usp=sharing
Advertisement
Answer
You are incrementing the page prior to grabbing the response. Just reorder like so.
while True: response = requests.get(BASE_URL + '/vacancies', headers={'User-Agent': generate_user_agent()}, params=params,) items = response.json()['items'] if not items: break pages.append(items) # Do it for each page params["page"] += 1