string1=" म नेपाली हुँ"
string1=string1.split()
string1[0]
'\xe0\xa4\xae'
with codecs.open('nepaliwords.txt','r','utf-8') as f:
for line in f:
if string1[0] in line:
print "matched string found in file"
Traceback (most recent call last): File "", line 3, in UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 0: ordinal not in range(128)
In the text files, I have large number of Nepali unicode.
Am I doing something wrong here comparing the two unicode string?
How can I print the matched unicode string?
Your string1
is a byte string, encoded to UTF-8. It is not a Unicode string. But you used codecs.open()
to have Python decode the file contents to unicode
. Trying to then use your byte string with a containment test causes Python to implicitly decode the byte string to unicode
to match types. This fails as the implicit decoding uses ASCII.
Decode string1
to unicode
first:
string1 = " म नेपाली हुँ"
string1 = string1.decode('utf8').split()[0]
or use a Unicode string literal instead:
string1 = u" म नेपाली हुँ"
string1 = string1.split()[0]
Note the u
at the start.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments