namedtuple with unicode string as name

Thomas

I'm having trouble assigning unicode strings as names for a namedtuple. This works:

a = collections.namedtuple("test", "value")

and this doesn't:

b = collections.namedtuple("βαδιζόντων", "value")

I get the error

Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/usr/lib64/python3.4/collections/__init__.py", line 370, in namedtuple
        result = namespace[typename]
KeyError: 'βαδιζόντων'

Why is that the case? The documentation says, "Python 3 also supports using Unicode characters in identifiers," and the key is valid unicode?

bobince

The problem is specifically with the letter (U+1F79 Greek small letter omicron with oxia). This is a ‘compatibility character’: Unicode would rather you use ό instead (U+03CC Greek small letter omicron with tonos). U+1F79 only exists in Unicode in order to round-trip to old character sets that distinguished between oxia and tonos, a distinction that later turned out to be incorrect.

When you use compatibility characters in an identifier, Python's source code parser automatically normalises them to form NFKC, so your class name ends up with U+03CC in it.

Unfortunately collections.namedtuple doesn't know about this. The way it creates the new class instance is by inserting the given name into a bunch of Python code in a string, then executing it (yuck, right?), and extracting the class from the resultant locals dict using its name... the original name, not the normalised version Python has actually compiled, so it fails.

This is a bug in collections which may be worth filing, but for now you should use the canonical character U+03CC ό.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related