I want to build a web scraper. Currently, I’m learning Python. This is the very basics!
Python Code
JavaScript
x
10
10
1
import urllib.request
2
import re
3
4
htmlfile = urllib.request.urlopen("http://basketball.realgm.com/")
5
6
htmltext = htmlfile.read()
7
title = re.findall('<title>(.*)</title>', htmltext)
8
9
print (htmltext)
10
Error:
JavaScript
1
4
1
File "C:Python33libre.py", line 201, in findall
2
return _compile(pattern, flags).findall(string)
3
TypeError: can't use a string pattern on a bytes-like object
4
Advertisement
Answer
You have to decode your data. Since the website in question says
JavaScript
1
2
1
charset=iso-8859-1
2
use that. utf-8 won’t work in this case.
JavaScript
1
2
1
htmltext = htmlfile.read().decode('iso-8859-1')
2