Skip to content
Advertisement

Saving UTF-8 texts with json.dumps as UTF-8, not as a u escape sequence

Sample code (in a REPL):

JavaScript

Output:

JavaScript

The problem: it’s not human readable. My (smart) users want to verify or even edit text files with JSON dumps (and I’d rather not use XML).

Is there a way to serialize objects into UTF-8 JSON strings (instead of uXXXX)?

Advertisement

Answer

Use the ensure_ascii=False switch to json.dumps(), then encode the value to UTF-8 manually:

JavaScript

If you are writing to a file, just use json.dump() and leave it to the file object to encode:

JavaScript

Caveats for Python 2

For Python 2, there are some more caveats to take into account. If you are writing this to a file, you can use io.open() instead of open() to produce a file object that encodes Unicode values for you as you write, then use json.dump() instead to write to that file:

JavaScript

Do note that there is a bug in the json module where the ensure_ascii=False flag can produce a mix of unicode and str objects. The workaround for Python 2 then is:

JavaScript

In Python 2, when using byte strings (type str), encoded to UTF-8, make sure to also set the encoding keyword:

JavaScript
Advertisement