Skip to content
Advertisement

UnicodeDecodeError when try to read data from ‘google.com’ in Python

I’m starting to learn about reading data from a website. But when I try to read data from google.com I encounter this error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe0 in position 279: invalid continuation byte

Below are my code (extractly as the instruction video, only different website):

import urllib.request, urllib.parse, urllib.error
fhand=urllib.request.urlopen('https://www.google.com/')
for line in fhand:
    print(line.decode().strip())

What is wrong? Thanks in advance

Advertisement

Answer

Specifing the encoding and error handling should solve the problem:

import urllib.request, urllib.parse, urllib.error
fhand=urllib.request.urlopen('https://www.google.com/')
for line in fhand:
    print(line.decode(encoding="utf-8", errors="backslashreplace").strip())

When you are learning web scraping with python you might wanna have a look at BeautifulSoup: https://www.crummy.com/software/BeautifulSoup/bs4/doc/

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement