Hello Stack Overflow contributors!
I want to scrape multiple pages of a news website; it shows an error message during this step
response = requests.get(page, headers = user_agent)
The error message is
AttributeError: 'int' object has no attribute 'get'
The lines of code are
user_agent = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; Touch; rv:11.0) like Gecko'}
#controlling the crawl-rate
start_time = time()
request = 0
def scrape(url):
urls = [url + str(x) for x in range(0,10)]
for page in urls:
response = requests.get(page, headers = user_agent)
print(page)
print(scrape('https://nypost.com/search/China+COVID-19/page/'))
More specifically, this page and pages next to it are what I want to scrape:
https://nypost.com/search/China+COVID-19/page/1/?orderby=relevance
Any helps would be greatly appreciated!!
Advertisement
Answer
For me this code runs okay. I did have to put request inside your function. Make sure you do not mix up the module requests with your variable request.
from random import randint
from time import sleep, time
from bs4 import BeautifulSoup as bs
user_agent = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; Touch; rv:11.0) like Gecko'}
# controlling the crawl-rate
start_time = time()
def scrape(url):
request = 0
urls = [f"{url}{x}" for x in range(0,10)]
params = {
"orderby": "relevance",
}
for page in urls:
response = requests.get(url=page,
headers=user_agent,
params=params)
#pause the loop
sleep(randint(8,15))
#monitor the requests
request += 1
elapsed_time = time() - start_time
print('Request:{}; Frequency: {} request/s'.format(request, request/elapsed_time))
# clear_output(wait = True)
#throw a warning for non-200 status codes
if response.status_code != 200:
warn('Request: {}; Status code: {}'.format(request, response.status_code))
#Break the loop if the number of requests is greater than expected
if request > 72:
warn('Number of request was greater than expected.')
break
#parse the content
soup_page = bs(response.text, 'lxml')
print(scrape('https://nypost.com/search/China+COVID-19/page/'))