Different results when sorting character vectors

wujohn1990

I am wondering how the R sorting algorithm works, when sorting character vector

a = c("aa(150)", "aa(1)S")
sort(a)
# [1] "aa(150)" "aa(1)S" 
a = c("aa(150)", "aa(1)")
sort(a)
# [1] "aa(1)" "aa(150)"

Doesn't R compare the integer value of the characters one by one from left to right? Why adding a character can change the result?

I thought the sorting is determined by the "5" and ")" characters, and characters after are ignored.

For comparison with Python

In [1]: a=["aa(150)","aa(1)"]
In [2]: sorted(a)
Out[2]: ['aa(1)', 'aa(150)']
In [3]: a=["aa(150)","aa(1)S"]
In [4]: sorted(a)
Out[4]: ['aa(1)S', 'aa(150)']
Pierre L

Set the locale to a default that will turn off locale-specific sorting in most cases:

Sys.setlocale("LC_COLLATE", "C")
a=c("aa(150)","aa(1)S")
sort(a)
#[1] "aa(1)S"  "aa(150)"

String collation has to be internationally specific due to language differences. From the help for ?sort:

The sort order for character vectors will depend on the collating sequence of the locale in use: see Comparison.

We can then go to ?Comparisons for:

Comparison of strings in character vectors is lexicographic within the strings using the collating sequence of the locale in use: see locales. The collating sequence of locales such as en_US is normally different from C (which should use ASCII) and can be surprising. Beware of making any assumptions about the collation order: e.g. in Estonian Z comes between S and T, and collation is not necessarily character-by-character – in Danish aa sorts as a single letter, after z. In Welsh ng may or may not be a single sorting unit: if it is it follows g.

As mentioned, because each language uses letters in different ways, the locale matters for sorting.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Different results when sorting character vectors

From Dev

different results when I encode a character to base64

From Dev

Sorting results based on different criteria in mysql

From Dev

Matching numbers by their order when in two different vectors

From Dev

Build column of data frame with character vectors of different length?

From Dev

Comparing same character gives different results

From Dev

sapply and apply give different results with is.character()

From Dev

postgresql ignoring ~ character when sorting in Django project

From Dev

postgresql ignoring ~ character when sorting in Django project

From Dev

Sorting lists inside vectors of vectors

From Dev

Different results when printing '\0' on different PCs?

From Dev

Sorting using objects in vectors

From Dev

Sorting a pair of vectors

From Dev

Sorting vectors in a loop

From Dev

Sorting a Vector of Vectors

From Dev

sorting vector of pair of vectors

From Dev

Sorting using objects in vectors

From Dev

Sorting vectors in c++

From Dev

Sorting a table that another excel sheet uses causes different VLOOKUP results

From Dev

How do I sort MySQL results by 2 different variations of sorting?

From Dev

Python: Running a for loop with multiple inputs and sorting the results into different lists

From Dev

How are vectors different from arrays when used in recursive calls?

From Dev

R Match character vectors

From Dev

sort character vectors in R

From Dev

Different encoding results for same UTF-8 character for ã

From Dev

Increasing fetchLimit and sorting results when using readWithQueryString Azure's call

From Dev

same sorting algorithm,same array,but gives different time for sorting when sorting same array few times

From Dev

Different results in Java and Python when computing powers

From Dev

Different Results When Using the ThreadStatic Attribute

Related Related

  1. 1

    Different results when sorting character vectors

  2. 2

    different results when I encode a character to base64

  3. 3

    Sorting results based on different criteria in mysql

  4. 4

    Matching numbers by their order when in two different vectors

  5. 5

    Build column of data frame with character vectors of different length?

  6. 6

    Comparing same character gives different results

  7. 7

    sapply and apply give different results with is.character()

  8. 8

    postgresql ignoring ~ character when sorting in Django project

  9. 9

    postgresql ignoring ~ character when sorting in Django project

  10. 10

    Sorting lists inside vectors of vectors

  11. 11

    Different results when printing '\0' on different PCs?

  12. 12

    Sorting using objects in vectors

  13. 13

    Sorting a pair of vectors

  14. 14

    Sorting vectors in a loop

  15. 15

    Sorting a Vector of Vectors

  16. 16

    sorting vector of pair of vectors

  17. 17

    Sorting using objects in vectors

  18. 18

    Sorting vectors in c++

  19. 19

    Sorting a table that another excel sheet uses causes different VLOOKUP results

  20. 20

    How do I sort MySQL results by 2 different variations of sorting?

  21. 21

    Python: Running a for loop with multiple inputs and sorting the results into different lists

  22. 22

    How are vectors different from arrays when used in recursive calls?

  23. 23

    R Match character vectors

  24. 24

    sort character vectors in R

  25. 25

    Different encoding results for same UTF-8 character for ã

  26. 26

    Increasing fetchLimit and sorting results when using readWithQueryString Azure's call

  27. 27

    same sorting algorithm,same array,but gives different time for sorting when sorting same array few times

  28. 28

    Different results in Java and Python when computing powers

  29. 29

    Different Results When Using the ThreadStatic Attribute

HotTag

Archive