I would like to collect information from the results given by a search engine. But I can only write text instead of unicode in the query part.
JavaScript
x
9
1
import urllib2
2
a = "바둑"
3
a = a.decode("utf-8")
4
type(a)
5
#Out[35]: unicode
6
7
url = "http://search.naver.com/search.naver?where=nexearch&query=%s" %(a)
8
url2 = urllib2.urlopen(url)
9
give this error
JavaScript
1
2
1
#UnicodeEncodeError: 'ascii' codec can't encode characters in position 39-40: ordinal not in range(128)
2
Advertisement
Answer
Encode the Unicode data to UTF-8, then URL-encode:
JavaScript
1
9
1
from urllib import urlencode
2
import urllib2
3
4
params = {'where': 'nexearch', 'query': a.encode('utf8')}
5
params = urlencode(params)
6
7
url = "http://search.naver.com/search.naver?" + params
8
response = urllib2.urlopen(url)
9
Demo:
JavaScript
1
10
10
1
>>> from urllib import urlencode
2
>>> a = u"바둑"
3
>>> params = {'where': 'nexearch', 'query': a.encode('utf8')}
4
>>> params = urlencode(params)
5
>>> params
6
'query=%EB%B0%94%EB%91%91&where=nexearch'
7
>>> url = "http://search.naver.com/search.naver?" + params
8
>>> url
9
'http://search.naver.com/search.naver?query=%EB%B0%94%EB%91%91&where=nexearch'
10
Using urllib.urlencode()
to build the parameters is easier, but you can also just escape the query
value with urllib.quote_plus()
:
JavaScript
1
4
1
from urllib import quote_plus
2
encoded_a = quote_plus(a.encode('utf8'))
3
url = "http://search.naver.com/search.naver?where=nexearch&query=%s" % encoded_a
4