Skip to content
Advertisement

How to escape unicode special chars in string and write it to UTF encoded file

What I aim to achieve is to:

string like:

JavaScript

convert to:

JavaScript

and write it in this form to file (which is UTF-8 encoded).

Advertisement

Answer

Another solution, not relying on the built-in repr() but rather implementing it from scratch:

JavaScript

Differences:

  • Encodes only using u, never any other sequence, whereas repr() uses about a third of the alphabet (so for example the BEL character will be encoded as u0007 rather than a)
  • Upper-case encoding, as specified (u00FC rather than u00fc)
  • Does not handle unicode characters outside plane 0 (could be extended easily, given a spec for how those should be represented)
  • It does not take care of any pre-existing u sequences, whereas repr() turns those into \u; could be extended, perhaps to encode as u005C:
    JavaScript
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement