Skip to content
Advertisement

Python 3.8: Escape non-ascii characters as unicode

I have input and output text files which can contain non-ascii characters. Sometimes I need to escape them and sometimes I need to write the non-ascii characters. Basically if I get “Bürgerhaus” I need to output “Bu00FCrgerhaus”. If I get “Bu00FCrgerhaus” I need to output “Bürgerhaus”.

One direction goes fine:

JavaScript

however in the other direction I do not get the expected result (‘Bu00FCrgerhaus’):

JavaScript

I read that unicode-escape needs latin-1, I tried to encode it to it, but this did not product a result either. What am I doing wrong?

(PS: Thank you Matthias for reminding me that the conversion in the first example was not necessary.)

Advertisement

Answer

You could do something like this:

JavaScript

Out:

JavaScript

Explanation:

We are going to covert each character to it’s corresponding Unicode code point.

JavaScript

Regular ASCII characters decimal values are < 128, bigger values, like Eur-Sign, german Umlauts … got values >= 128 (detailed table here).

Now, we are going to ‘encoded’ all characters >= 128 with their corresponding unicode representation.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement