When I use bs4 an requests locally it works but when i put my code
JavaScript
x
14
14
1
def scrape_data(username):
2
3
# getting the request from url
4
r = requests.get(URL.format(username))
5
6
# converting the text
7
s = BeautifulSoup(r.text, "html.parser")
8
9
# finding meta info
10
meta = s.find("meta", property ="og:description")
11
12
# calling parse method
13
return parse_data(meta.attrs['content'])
14
In replit :(The Error): The ERROR
Please Help Me !
If someone can explain what is the problem with replit .
Advertisement
Answer
This would be much easier to debug if you included a sample link [a plausible value of URL.format(username)
].
The error seems to be raised at meta.attrs
due to meta
having a null value [None
] because s.find("meta", property ="og:description")
found no matching tags. It’s either because
.find
is not being used properly [you could tryattrs={"property": "og:description"}
instead ofproperty ="og:description"
] or because- there’s no such tag to be found in
r.text
due torequests.get
failing for some reason [you can check by adding a line withr.raise_for_status()
], or- (as one comment mentioned) that tag being loaded by JavaScript [in which case, you need to try something like selenium because just
requests
will not suffice].
If this error is only happening for some sites and you just want to avoid raising any errors, you can just return None
unless the tag is found:
JavaScript
1
5
1
# calling parse method [but only if meta was found]
2
return parse_data(meta.attrs['content']) if meta is not None else None ## same as:
3
# return parse_data(meta.attrs['content']) if meta else None ## same as:
4
# return None if meta is None else parse_data(meta.attrs['content'])
5
You can also use something like this error-logging function if scrape_data
works for some sites but not others and you aren’t sure why.
JavaScript
1
15
15
1
# import os
2
# import pandas as pd
3
# def logError_scrapes... ## PASTE FROM https://pastebin.com/cxGH50Mc
4
5
# def scrape_data(username):
6
# r = requests.get(URL.format(username))
7
# s = BeautifulSoup(r.text, "html.parser")
8
# meta = s.find("meta", property ="og:description")
9
10
if not meta:
11
## you can calculate/set something more meaningful as logId [if you want]
12
logError_scrapes(logId='scrape_dx', url=URL.format(username), req=r, rSoup=s)
13
return None
14
# return parse_data(meta.attrs['content'])
15