Skip to content
Advertisement

Which unicode characters can be used in python3 scripts?

Some unicode characters can be used to name variables, functions, etc. without any problems, e.g. α.

>>> α = "Hello world!"
>>> print(α)
Hello world!

Other unicode characters raise an error, e.g. ∇.

>>> ∇ = "Hello world"
  File "<stdin>", line 1
    ∇
    ^
SyntaxError: invalid character '∇' (U+2207)

Which unicode characters can be used to form valid expressions in python? Which unicode characters will raise a SyntaxError?

And, is there a reasonable means of including unicode characters that raise errors in python scripts? I would like to use ∇ in function names that compute gradients, e.g. ∇f to represent the gradient of f.

Advertisement

Answer

https://docs.python.org/3/reference/lexical_analysis.html#identifiers says:

Within the ASCII range (U+0001..U+007F), the valid characters for identifiers are the same as in Python 2.x: the uppercase and lowercase letters A through Z, the underscore _ and, except for the first character, the digits 0 through 9.

Python 3.0 introduces additional characters from outside the ASCII range (see PEP 3131). For these characters, the classification uses the version of the Unicode Character Database as included in the unicodedata module.

Let’s check your specific characters:

~$ python3
Python 3.6.9 (default, Jan 26 2021, 15:33:00) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import unicodedata
>>> unicodedata.category('α')
'Ll'
>>> unicodedata.category('∇')
'Sm'
>>> 

α is categorized as “Letter, lower-case”, while ∇ is a “symbol, math”. The former category is allowed in identifiers, but the latter is not. The full list of allowed character categories is available in the docs link at the top.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement