UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0xd1 in position 2: ordinal not in range(128)

Question

I am attempting to work with a very large dataset that has some non-standard characters in it. I need to use unicode, as per the job specs, but I am baffled. (And quite possibly doing it all wrong.) I open the CSV using: Then, I attempt to encode it with: I&#8217;m encoding everything except the lat and lng b…

Accepted Answer

Unicode is not equal to UTF-8. The latter is just an encoding for the former.You are doing it the wrong way around. You are reading UTF-8-encoded data, so you have to decode the UTF-8-encoded String into a unicode string.So just replace .encode with .decode, and it should work (if your .csv is UTF-8-encoded).Nothing to be ashamed of, though. I bet 3 in 5 programmers had trouble at first understanding this, if not more ;)Update:If your input data is not UTF-8 encoded, then you have to .decode() with the appropriate encoding, of course. If nothing is given, python assumes ASCII, which obviously fails on non-ASCII-characters.

Advertisement

Answer