I would like to collect information from the results given by a search engine. But I can only write text instead of unicode in the query part.
import urllib2 a = "바둑" a = a.decode("utf-8") type(a) #Out[35]: unicode url = "http://search.naver.com/search.naver?where=nexearch&query=%s" %(a) url2 = urllib2.urlopen(url)
give this error
#UnicodeEncodeError: 'ascii' codec can't encode characters in position 39-40: ordinal not in range(128)
Advertisement
Answer
Encode the Unicode data to UTF-8, then URL-encode:
from urllib import urlencode import urllib2 params = {'where': 'nexearch', 'query': a.encode('utf8')} params = urlencode(params) url = "http://search.naver.com/search.naver?" + params response = urllib2.urlopen(url)
Demo:
>>> from urllib import urlencode >>> a = u"바둑" >>> params = {'where': 'nexearch', 'query': a.encode('utf8')} >>> params = urlencode(params) >>> params 'query=%EB%B0%94%EB%91%91&where=nexearch' >>> url = "http://search.naver.com/search.naver?" + params >>> url 'http://search.naver.com/search.naver?query=%EB%B0%94%EB%91%91&where=nexearch'
Using urllib.urlencode()
to build the parameters is easier, but you can also just escape the query
value with urllib.quote_plus()
:
from urllib import quote_plus encoded_a = quote_plus(a.encode('utf8')) url = "http://search.naver.com/search.naver?where=nexearch&query=%s" % encoded_a