Why is coreutils sort slower than Python?

augurar

I wrote the following script to test the speed of Python's sort functionality:

from sys import stdin, stdout
lines = list(stdin)
lines.sort()
stdout.writelines(lines)

I then compared this to the coreutils sort command on a file containing 10 million lines:

$ time python sort.py <numbers.txt >s1.txt
real    0m16.707s
user    0m16.288s
sys     0m0.420s

$ time sort <numbers.txt >s2.txt 
real    0m45.141s
user    2m28.304s
sys     0m0.380s

The built-in command used all four CPUs (Python only used one) but took about 3 times as long to run! What gives?

I am using Ubuntu 12.04.5 (32-bit), Python 2.7.3, and sort 8.13

augurar

Izkata's comment revealed the answer: locale-specific comparisons. The sort command uses the locale indicated by the environment, whereas Python defaults to a byte order comparison. Comparing UTF-8 strings is harder than comparing byte strings.

$ time (LC_ALL=C sort <numbers.txt >s2.txt)
real    0m5.485s
user    0m14.028s
sys     0m0.404s

How about that.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Java

Why is max slower than sort?

From Java

Why is Collections.sort() much slower than Arrays.sort()?

From Dev

Why is my merge sort code slower than insertion sort

From Dev

Why is Collections.sort() much slower than Arrays.sort()?

From Dev

Why my selection sort is slower than insertion sort

From Dev

Why is nested `if` in Python much slower than parallel `and`?

From Dev

Python: Why is list comprehension slower than for loop

From Dev

why readline() is much slower than readlines() in Python?

From Dev

lambda is slower than function call in python, why

From Dev

Why is this Rust slower than my similar Python?

From Java

Why is heap slower than sort for K Closest Points to Origin?

From Dev

Why is std::shuffle as slow (or even slower than) std::sort?

From Dev

Why is heap slower than sort for K Closest Points to Origin?

From Dev

Why is "or" slower than "and" in Java?

From Dev

QuickSort is slower than std::sort

From Dev

Why is Python 3 is considerably slower than Python 2?

From Dev

Parallel sort slower than serial sort

From Dev

Python: Why is threaded function slower than non thread

From Dev

Why is using a Python generator much slower to traverse binary tree than not?

From Dev

Why is this Haskell program so much slower than an equivalent Python one?

From Dev

Python | Why is accessing instance attribute slower than local?

From Dev

Why my bisection search is slower than linear search in python?

From Dev

Why mesh python code slower than decomposed one?

From Dev

Why is this implementation of binary heap slower than that of Python's stdlib?

From Dev

Why python multiprocess pool is slower than one process only?

From Dev

Why is calling float() on a number slower than adding 0.0 in Python?

From Dev

Why is this generator pipeline slower than a traditional loop in Python?

From Dev

Why python broadcasting in the example below is slower than a simple loop?

From Dev

Why is NumPy sometimes slower than NumPy + plain Python loop?

Related Related

  1. 1

    Why is max slower than sort?

  2. 2

    Why is Collections.sort() much slower than Arrays.sort()?

  3. 3

    Why is my merge sort code slower than insertion sort

  4. 4

    Why is Collections.sort() much slower than Arrays.sort()?

  5. 5

    Why my selection sort is slower than insertion sort

  6. 6

    Why is nested `if` in Python much slower than parallel `and`?

  7. 7

    Python: Why is list comprehension slower than for loop

  8. 8

    why readline() is much slower than readlines() in Python?

  9. 9

    lambda is slower than function call in python, why

  10. 10

    Why is this Rust slower than my similar Python?

  11. 11

    Why is heap slower than sort for K Closest Points to Origin?

  12. 12

    Why is std::shuffle as slow (or even slower than) std::sort?

  13. 13

    Why is heap slower than sort for K Closest Points to Origin?

  14. 14

    Why is "or" slower than "and" in Java?

  15. 15

    QuickSort is slower than std::sort

  16. 16

    Why is Python 3 is considerably slower than Python 2?

  17. 17

    Parallel sort slower than serial sort

  18. 18

    Python: Why is threaded function slower than non thread

  19. 19

    Why is using a Python generator much slower to traverse binary tree than not?

  20. 20

    Why is this Haskell program so much slower than an equivalent Python one?

  21. 21

    Python | Why is accessing instance attribute slower than local?

  22. 22

    Why my bisection search is slower than linear search in python?

  23. 23

    Why mesh python code slower than decomposed one?

  24. 24

    Why is this implementation of binary heap slower than that of Python's stdlib?

  25. 25

    Why python multiprocess pool is slower than one process only?

  26. 26

    Why is calling float() on a number slower than adding 0.0 in Python?

  27. 27

    Why is this generator pipeline slower than a traditional loop in Python?

  28. 28

    Why python broadcasting in the example below is slower than a simple loop?

  29. 29

    Why is NumPy sometimes slower than NumPy + plain Python loop?

HotTag

Archive