Times-two faster than bit-shift, for Python 3.x integers?

hilberts_drinking_problem

I was looking at the source of sorted_containers and was surprised to see this line:

self._load, self._twice, self._half = load, load * 2, load >> 1

Here load is an integer. Why use bit shift in one place, and multiplication in another? It seems reasonable that bit shifting may be faster than integral division by 2, but why not replace the multiplication by a shift as well? I benchmarked the the following cases:

  1. (times, divide)
  2. (shift, shift)
  3. (times, shift)
  4. (shift, divide)

and found that #3 is consistently faster than other alternatives:

# self._load, self._twice, self._half = load, load * 2, load >> 1

import random
import timeit
import pandas as pd

x = random.randint(10 ** 3, 10 ** 6)

def test_naive():
    a, b, c = x, 2 * x, x // 2

def test_shift():
    a, b, c = x, x << 1, x >> 1    

def test_mixed():
    a, b, c = x, x * 2, x >> 1    

def test_mixed_swapped():
    a, b, c = x, x << 1, x // 2

def observe(k):
    print(k)
    return {
        'naive': timeit.timeit(test_naive),
        'shift': timeit.timeit(test_shift),
        'mixed': timeit.timeit(test_mixed),
        'mixed_swapped': timeit.timeit(test_mixed_swapped),
    }

def get_observations():
    return pd.DataFrame([observe(k) for k in range(100)])

enter image description here enter image description here

The question:

Is my test valid? If so, why is (multiply, shift) faster than (shift, shift)?

I run Python 3.5 on Ubuntu 14.04.

Edit

Above is the original statement of the question. Dan Getz provides an excellent explanation in his answer.

For the sake of completeness, here are sample illustrations for larger x when multiplication optimizations do not apply.

enter image description here enter image description here

Dan Getz

This seems to be because multiplication of small numbers is optimized in CPython 3.5, in a way that left shifts by small numbers are not. Positive left shifts always create a larger integer object to store the result, as part of the calculation, while for multiplications of the sort you used in your test, a special optimization avoids this and creates an integer object of the correct size. This can be seen in the source code of Python's integer implementation.

Because integers in Python are arbitrary-precision, they are stored as arrays of integer "digits", with a limit on the number of bits per integer digit. So in the general case, operations involving integers are not single operations, but instead need to handle the case of multiple "digits". In pyport.h, this bit limit is defined as 30 bits on 64-bit platform, or 15 bits otherwise. (I'll just call this 30 from here on to keep the explanation simple. But note that if you were using Python compiled for 32-bit, your benchmark's result would depend on if x were less than 32,768 or not.)

When an operation's inputs and outputs stay within this 30-bit limit, the operation can be handled in an optimized way instead of the general way. The beginning of the integer multiplication implementation is as follows:

static PyObject *
long_mul(PyLongObject *a, PyLongObject *b)
{
    PyLongObject *z;

    CHECK_BINOP(a, b);

    /* fast path for single-digit multiplication */
    if (Py_ABS(Py_SIZE(a)) <= 1 && Py_ABS(Py_SIZE(b)) <= 1) {
        stwodigits v = (stwodigits)(MEDIUM_VALUE(a)) * MEDIUM_VALUE(b);
#ifdef HAVE_LONG_LONG
        return PyLong_FromLongLong((PY_LONG_LONG)v);
#else
        /* if we don't have long long then we're almost certainly
           using 15-bit digits, so v will fit in a long.  In the
           unlikely event that we're using 30-bit digits on a platform
           without long long, a large v will just cause us to fall
           through to the general multiplication code below. */
        if (v >= LONG_MIN && v <= LONG_MAX)
            return PyLong_FromLong((long)v);
#endif
    }

So when multiplying two integers where each fits in a 30-bit digit, this is done as a direct multiplication by the CPython interpreter, instead of working with the integers as arrays. (MEDIUM_VALUE() called on a positive integer object simply gets its first 30-bit digit.) If the result fits in a single 30-bit digit, PyLong_FromLongLong() will notice this in a relatively small number of operations, and create a single-digit integer object to store it.

In contrast, left shifts are not optimized this way, and every left shift deals with the integer being shifted as an array. In particular, if you look at the source code for long_lshift(), in the case of a small but positive left shift, a 2-digit integer object is always created, if only to have its length truncated to 1 later: (my comments in /*** ***/)

static PyObject *
long_lshift(PyObject *v, PyObject *w)
{
    /*** ... ***/

    wordshift = shiftby / PyLong_SHIFT;   /*** zero for small w ***/
    remshift  = shiftby - wordshift * PyLong_SHIFT;   /*** w for small w ***/

    oldsize = Py_ABS(Py_SIZE(a));   /*** 1 for small v > 0 ***/
    newsize = oldsize + wordshift;
    if (remshift)
        ++newsize;   /*** here newsize becomes at least 2 for w > 0, v > 0 ***/
    z = _PyLong_New(newsize);

    /*** ... ***/
}

Integer division

You didn't ask about the worse performance of integer floor division compared to right shifts, because that fit your (and my) expectations. But dividing a small positive number by another small positive number is not as optimized as small multiplications, either. Every // computes both the quotient and the remainder using the function long_divrem(). This remainder is computed for a small divisor with a multiplication, and is stored in a newly-allocated integer object, which in this situation is immediately discarded.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Java

Why is 2 * x * x faster than 2 * ( x * x ) in Python 3.x, for integers?

From Dev

Why is 2 * x * x faster than 2 * ( x * x ) in Python 3.x, for integers?

From Java

Why is x**4.0 faster than x**4 in Python 3?

From Dev

Why is x**4.0 faster than x**4 in Python 3?

From Dev

Why simple Scala tailrec loop for fibonacci calculation is faster in 3x times than Java loop?

From Dev

Why is dict definition faster in Python 2.7 than in Python 3.x?

From Dev

python sets with integers is splitting integers greater than 10 into two digits

From Dev

Why x64 bit version is way more faster than x32?

From Dev

Determine whether two sets of integers are the same, faster than N log(N)

From Dev

64 bit code generated by GCC is 3 times slower than 32 bit

From Dev

How to sort an array of integers faster than quicksort?

From Dev

Why is “/dev/rdisk” about 20 times faster than “/dev/disk” in Mac OS X

From Dev

Trial-division code runs 2x faster as 32-bit on Windows than 64-bit on Linux

From Java

Why is 'x' in ('x',) faster than 'x' == 'x'?

From Dev

Why is Udcast is many times faster than Netcat?

From Dev

Why @autoreleasepool is 6 times faster than NSAutoreleasePool?

From Dev

Is appending the html faster than pasting it multiple times?

From Dev

In python, why is s*3 faster than s+s+s?

From Dev

Is deque from the collections module really 100 times faster at prepending than list in Python?

From Dev

Adding two hex numbers javascript with bit shift

From Java

Why is list(x for x in a) faster for a=[0] than for a=[]?

From Dev

Why is `-1 * x` faster than `-x` and why?

From Dev

Faster way to shift a decimal in Python to remove zeros?

From Dev

Faster way to shift a decimal in Python to remove zeros?

From Dev

Oracle 10g Full table scan(parallel access) 100x times faster than index access by rowid

From Dev

Oracle 10g Full table scan(parallel access) 100x times faster than index access by rowid

From Dev

For a 3x3 only symmetric and positive definite linear system, is Cholesky still faster than Householder?

From Dev

Python find duplicates which occur more than 3 times

From Dev

Bitwise xor of two 256-bit integers

Related Related

  1. 1

    Why is 2 * x * x faster than 2 * ( x * x ) in Python 3.x, for integers?

  2. 2

    Why is 2 * x * x faster than 2 * ( x * x ) in Python 3.x, for integers?

  3. 3

    Why is x**4.0 faster than x**4 in Python 3?

  4. 4

    Why is x**4.0 faster than x**4 in Python 3?

  5. 5

    Why simple Scala tailrec loop for fibonacci calculation is faster in 3x times than Java loop?

  6. 6

    Why is dict definition faster in Python 2.7 than in Python 3.x?

  7. 7

    python sets with integers is splitting integers greater than 10 into two digits

  8. 8

    Why x64 bit version is way more faster than x32?

  9. 9

    Determine whether two sets of integers are the same, faster than N log(N)

  10. 10

    64 bit code generated by GCC is 3 times slower than 32 bit

  11. 11

    How to sort an array of integers faster than quicksort?

  12. 12

    Why is “/dev/rdisk” about 20 times faster than “/dev/disk” in Mac OS X

  13. 13

    Trial-division code runs 2x faster as 32-bit on Windows than 64-bit on Linux

  14. 14

    Why is 'x' in ('x',) faster than 'x' == 'x'?

  15. 15

    Why is Udcast is many times faster than Netcat?

  16. 16

    Why @autoreleasepool is 6 times faster than NSAutoreleasePool?

  17. 17

    Is appending the html faster than pasting it multiple times?

  18. 18

    In python, why is s*3 faster than s+s+s?

  19. 19

    Is deque from the collections module really 100 times faster at prepending than list in Python?

  20. 20

    Adding two hex numbers javascript with bit shift

  21. 21

    Why is list(x for x in a) faster for a=[0] than for a=[]?

  22. 22

    Why is `-1 * x` faster than `-x` and why?

  23. 23

    Faster way to shift a decimal in Python to remove zeros?

  24. 24

    Faster way to shift a decimal in Python to remove zeros?

  25. 25

    Oracle 10g Full table scan(parallel access) 100x times faster than index access by rowid

  26. 26

    Oracle 10g Full table scan(parallel access) 100x times faster than index access by rowid

  27. 27

    For a 3x3 only symmetric and positive definite linear system, is Cholesky still faster than Householder?

  28. 28

    Python find duplicates which occur more than 3 times

  29. 29

    Bitwise xor of two 256-bit integers

HotTag

Archive