Skip to content
Advertisement

An Error while using bs4 and requests in replit

When I use bs4 an requests locally it works but when i put my code

def scrape_data(username):
     
    # getting the request from url
    r = requests.get(URL.format(username))
     
    # converting the text
    s = BeautifulSoup(r.text, "html.parser")
     
    # finding meta info
    meta = s.find("meta", property ="og:description")
     
    # calling parse method
    return parse_data(meta.attrs['content'])

In replit :(The Error): The ERROR

Please Help Me !

If someone can explain what is the problem with replit .

Advertisement

Answer

This would be much easier to debug if you included a sample link [a plausible value of URL.format(username)].

The error seems to be raised at meta.attrs due to meta having a null value [None] because s.find("meta", property ="og:description") found no matching tags. It’s either because

  • .find is not being used properly [you could try attrs={"property": "og:description"} instead of property ="og:description"] or because
  • there’s no such tag to be found in r.text due to
    • requests.get failing for some reason [you can check by adding a line with r.raise_for_status()], or
    • (as one comment mentioned) that tag being loaded by JavaScript [in which case, you need to try something like selenium because just requests will not suffice].

If this error is only happening for some sites and you just want to avoid raising any errors, you can just return None unless the tag is found:

    # calling parse method [but only if meta was found]  
    return parse_data(meta.attrs['content']) if meta is not None else None ## same as:
    # return parse_data(meta.attrs['content']) if meta else None ## same as:
    # return None if meta is None else parse_data(meta.attrs['content']) 

You can also use something like this error-logging function if scrape_data works for some sites but not others and you aren’t sure why.

# import os
# import pandas as pd
# def logError_scrapes... ## PASTE FROM https://pastebin.com/cxGH50Mc

# def scrape_data(username):
    # r = requests.get(URL.format(username)) 
    # s = BeautifulSoup(r.text, "html.parser") 
    # meta = s.find("meta", property ="og:description")

    if not meta: 
        ## you can calculate/set something more meaningful as logId [if you want]
        logError_scrapes(logId='scrape_dx', url=URL.format(username), req=r, rSoup=s)
        return None
    # return parse_data(meta.attrs['content'])
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement