Skip to content
Advertisement

web-scraping error message: ‘int’ object has no attribute ‘get’

Hello Stack Overflow contributors!

I want to scrape multiple pages of a news website; it shows an error message during this step

 response = requests.get(page, headers = user_agent)

The error message is

AttributeError: 'int' object has no attribute 'get'

The lines of code are

user_agent = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; Touch; rv:11.0) like Gecko'}

#controlling the crawl-rate
start_time = time() 
request = 0

def scrape(url):
    urls = [url + str(x) for x in range(0,10)]
    for page in urls:
        response = requests.get(page, headers = user_agent)   
    print(page)
       
print(scrape('https://nypost.com/search/China+COVID-19/page/'))

More specifically, this page and pages next to it are what I want to scrape:

https://nypost.com/search/China+COVID-19/page/1/?orderby=relevance

Any helps would be greatly appreciated!!

Advertisement

Answer

For me this code runs okay. I did have to put request inside your function. Make sure you do not mix up the module requests with your variable request.

from random import randint
from time import sleep, time
from bs4 import BeautifulSoup as bs


user_agent = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; Touch; rv:11.0) like Gecko'}

# controlling the crawl-rate
start_time = time() 

def scrape(url):
    request = 0
    urls = [f"{url}{x}" for x in range(0,10)]
    params = {
       "orderby": "relevance",
    }
    for page in urls:
        response = requests.get(url=page,
                                headers=user_agent,
                                params=params)   

        #pause the loop
        sleep(randint(8,15))

        #monitor the requests
        request += 1
        elapsed_time = time() - start_time
        print('Request:{}; Frequency: {} request/s'.format(request, request/elapsed_time))
#         clear_output(wait = True)

        #throw a warning for non-200 status codes
        if response.status_code != 200:
            warn('Request: {}; Status code: {}'.format(request, response.status_code))

        #Break the loop if the number of requests is greater than expected
        if request > 72:
            warn('Number of request was greater than expected.')
            break

        #parse the content
        soup_page = bs(response.text, 'lxml') 
        
print(scrape('https://nypost.com/search/China+COVID-19/page/'))
Advertisement