solving the comparison of unicode input string in the file with unicode data

Bishal Gautam
string1=" म नेपाली  हुँ"
string1=string1.split()
string1[0]
'\xe0\xa4\xae'

with codecs.open('nepaliwords.txt','r','utf-8') as f:
     for line in f:
             if string1[0] in line:
                     print "matched string found in file"

Traceback (most recent call last): File "", line 3, in UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 0: ordinal not in range(128)

In the text files, I have large number of Nepali unicode.

Am I doing something wrong here comparing the two unicode string?

How can I print the matched unicode string?

Martijn Pieters

Your string1 is a byte string, encoded to UTF-8. It is not a Unicode string. But you used codecs.open() to have Python decode the file contents to unicode. Trying to then use your byte string with a containment test causes Python to implicitly decode the byte string to unicode to match types. This fails as the implicit decoding uses ASCII.

Decode string1 to unicode first:

string1 = " म नेपाली  हुँ"
string1 = string1.decode('utf8').split()[0]

or use a Unicode string literal instead:

string1 = u" म नेपाली  हुँ"
string1 = string1.split()[0]

Note the u at the start.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

String comparison and unicode

From Dev

Unicode elementwise string comparison in numpy

From Dev

PHP Japanese string comparison with Unicode

From Dev

Django template unicode String comparison

From Dev

Input unicode string with pyautogui

From Dev

comparing the unicode character from user input to unicode characters in file

From Dev

Unicode Comparison in Perl and Java

From Dev

How to print Unicode glyph names for input string?

From Dev

Unicode string comparison being interpreted as unequal (Python/Django app)

From Dev

Python: solving unicode hell with unidecode

From Dev

convert string representation of unicode to unicode

From Dev

Python: Searching a binary file (.PLM) for unicode string

From Dev

Form data comes through as unicode instead of string

From Dev

PHPExcel file "corrupt" when Unicode in data

From Dev

Extract Unicode data from CSV file

From Dev

Unicode comparison of Cyrillic 'С' and Latin 'C'

From Dev

caseless comparison of two unicode strings

From Dev

When is it better to use value comparison instead of identify comparison when checking if a string is unicode?

From Dev

input() and literal unicode parsing

From Dev

Convert user input to unicode

From Dev

Input to unicode in mysql and angular

From Dev

input() and literal unicode parsing

From Dev

Python unicode string to string?

From Dev

String to unicode string

From Dev

Python unittest AssertionError: unicode string is not unicode string

From Dev

UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode

From Dev

Print unicode literal string as Unicode character

From Dev

How to parse a haskell unicode string into unicode character

From Dev

How to convert a string with unicode in it to unicode using python

Related Related

  1. 1

    String comparison and unicode

  2. 2

    Unicode elementwise string comparison in numpy

  3. 3

    PHP Japanese string comparison with Unicode

  4. 4

    Django template unicode String comparison

  5. 5

    Input unicode string with pyautogui

  6. 6

    comparing the unicode character from user input to unicode characters in file

  7. 7

    Unicode Comparison in Perl and Java

  8. 8

    How to print Unicode glyph names for input string?

  9. 9

    Unicode string comparison being interpreted as unequal (Python/Django app)

  10. 10

    Python: solving unicode hell with unidecode

  11. 11

    convert string representation of unicode to unicode

  12. 12

    Python: Searching a binary file (.PLM) for unicode string

  13. 13

    Form data comes through as unicode instead of string

  14. 14

    PHPExcel file "corrupt" when Unicode in data

  15. 15

    Extract Unicode data from CSV file

  16. 16

    Unicode comparison of Cyrillic 'С' and Latin 'C'

  17. 17

    caseless comparison of two unicode strings

  18. 18

    When is it better to use value comparison instead of identify comparison when checking if a string is unicode?

  19. 19

    input() and literal unicode parsing

  20. 20

    Convert user input to unicode

  21. 21

    Input to unicode in mysql and angular

  22. 22

    input() and literal unicode parsing

  23. 23

    Python unicode string to string?

  24. 24

    String to unicode string

  25. 25

    Python unittest AssertionError: unicode string is not unicode string

  26. 26

    UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode

  27. 27

    Print unicode literal string as Unicode character

  28. 28

    How to parse a haskell unicode string into unicode character

  29. 29

    How to convert a string with unicode in it to unicode using python

HotTag

Archive