I am trying to take a file and remove all characters that are not in the greek language. We found the unicode values for the alphabet, 880 - 1023, and were able to print out the correct characters with a simple print(unichr(880))
line. The problem is when running this code
greek ='ÏÎ'
for c in greek:
if(unichr(c) >= 880 and unichr(c) <= 1023):
print(c)
Is there a way to enter any letter or symbol that will return a unicode value. We have tested with values inside of the greek range and outside and still get the same error, UnicodeDecodeError: 'ascii' codec cannot decode byte 0xc3 in position 0: ordinal not in range(128)
You have several problems. Assuming this is python 2 (since there is no unichr
in python 3 you'd get a different error) your first problem is that you didn't initialize a unicode string in the first place.
>>> greek ='ÏÎ'
>>> len(greek)
4
These aren't 2 unicode characters... they are 4 single byte characters that also happen to be the utf-8 encodings of the unicode characters. Instead do
greek =u'ÏÎ'
Next, these are not the droids, I mean greek characters, you think they are.
>>> ord(greek[0])
207
These are codepage characters in the 128-255 range and are outside of the range you are looking for. Did you want these instead?
>>> greek = u'Ϊΐ'
>>> ord(greek[0])
938
Finally, unichr
goes the wrong way... it converts ordinals to characters but you wanted to go the other way. So,
>>> for c in greek:
... if ord(c) >= 880 and ord(c) <= 1023:
... print(c)
...
Ϊ
ΐ
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句