I’m starting to learn about reading data from a website. But when I try to read data from google.com I encounter this error:
JavaScript
x
2
1
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe0 in position 279: invalid continuation byte
2
Below are my code (extractly as the instruction video, only different website):
JavaScript
1
5
1
import urllib.request, urllib.parse, urllib.error
2
fhand=urllib.request.urlopen('https://www.google.com/')
3
for line in fhand:
4
print(line.decode().strip())
5
What is wrong? Thanks in advance
Advertisement
Answer
Specifing the encoding and error handling should solve the problem:
JavaScript
1
5
1
import urllib.request, urllib.parse, urllib.error
2
fhand=urllib.request.urlopen('https://www.google.com/')
3
for line in fhand:
4
print(line.decode(encoding="utf-8", errors="backslashreplace").strip())
5
When you are learning web scraping with python you might wanna have a look at BeautifulSoup: https://www.crummy.com/software/BeautifulSoup/bs4/doc/