web-scraping error message: ‘int’ object has no attribute ‘get’

Question

Hello Stack Overflow contributors! I want to scrape multiple pages of a news website; it shows an error message during this step The error message is The lines of code are More specifically, this page and pages next to it are what I want to scrape: https://nypost.com/search/China+COVID-19/page/1/?orderby=relevance Any helps would be greatly appreciated!! Answer For me this code runs okay.

Accepted Answer

For me this code runs okay. I did have to put request inside your function. Make sure you do not mix up the module requests with your variable request.from random import randintfrom time import sleep, timefrom bs4 import BeautifulSoup as bsuser_agent = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; Touch; rv:11.0) like Gecko'}# controlling the crawl-ratestart_time = time() def scrape(url):    request = 0    urls = [f"{url}{x}" for x in range(0,10)]    params = {       "orderby": "relevance",    }    for page in urls:        response = requests.get(url=page,                                headers=user_agent,                                params=params)           #pause the loop        sleep(randint(8,15))        #monitor the requests        request += 1        elapsed_time = time() - start_time        print('Request:{}; Frequency: {} request/s'.format(request, request/elapsed_time))#         clear_output(wait = True)        #throw a warning for non-200 status codes        if response.status_code != 200:            warn('Request: {}; Status code: {}'.format(request, response.status_code))        #Break the loop if the number of requests is greater than expected        if request > 72:            warn('Number of request was greater than expected.')            break        #parse the content        soup_page = bs(response.text, 'lxml')         print(scrape('https://nypost.com/search/China+COVID-19/page/'))

Advertisement

Answer