Some unicode characters can be used to name variables, functions, etc. without any problems, e.g. α.
>>> α = "Hello world!" >>> print(α) Hello world!
Other unicode characters raise an error, e.g. ∇.
>>> ∇ = "Hello world" File "<stdin>", line 1 ∇ ^ SyntaxError: invalid character '∇' (U+2207)
Which unicode characters can be used to form valid expressions in python? Which unicode characters will raise a SyntaxError?
And, is there a reasonable means of including unicode characters that raise errors in python scripts? I would like to use ∇ in function names that compute gradients, e.g. ∇f to represent the gradient of f.
Advertisement
Answer
https://docs.python.org/3/reference/lexical_analysis.html#identifiers says:
Within the ASCII range (U+0001..U+007F), the valid characters for identifiers are the same as in Python 2.x: the uppercase and lowercase letters A through Z, the underscore _ and, except for the first character, the digits 0 through 9.
Python 3.0 introduces additional characters from outside the ASCII range (see PEP 3131). For these characters, the classification uses the version of the Unicode Character Database as included in the unicodedata module.
Let’s check your specific characters:
~$ python3 Python 3.6.9 (default, Jan 26 2021, 15:33:00) [GCC 8.4.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import unicodedata >>> unicodedata.category('α') 'Ll' >>> unicodedata.category('∇') 'Sm' >>>
α is categorized as “Letter, lower-case”, while ∇ is a “symbol, math”. The former category is allowed in identifiers, but the latter is not. The full list of allowed character categories is available in the docs link at the top.