Why is 'x' in ('x',) faster than 'x' == 'x'?

Markus Meskanen
>>> timeit.timeit("'x' in ('x',)")
0.04869917374131205
>>> timeit.timeit("'x' == 'x'")
0.06144205736110564

Also works for tuples with multiple elements, both versions seem to grow linearly:

>>> timeit.timeit("'x' in ('x', 'y')")
0.04866674801541748
>>> timeit.timeit("'x' == 'x' or 'x' == 'y'")
0.06565782838087131
>>> timeit.timeit("'x' in ('y', 'x')")
0.08975995576448526
>>> timeit.timeit("'x' == 'y' or 'x' == 'y'")
0.12992391047427532

Based on this, I think I should totally start using in everywhere instead of ==!

Veedrac

As I mentioned to David Wolever, there's more to this than meets the eye; both methods dispatch to is; you can prove this by doing

min(Timer("x == x", setup="x = 'a' * 1000000").repeat(10, 10000))
#>>> 0.00045456900261342525

min(Timer("x == y", setup="x = 'a' * 1000000; y = 'a' * 1000000").repeat(10, 10000))
#>>> 0.5256857610074803

The first can only be so fast because it checks by identity.

To find out why one would take longer than the other, let's trace through execution.

They both start in ceval.c, from COMPARE_OP since that is the bytecode involved

TARGET(COMPARE_OP) {
    PyObject *right = POP();
    PyObject *left = TOP();
    PyObject *res = cmp_outcome(oparg, left, right);
    Py_DECREF(left);
    Py_DECREF(right);
    SET_TOP(res);
    if (res == NULL)
        goto error;
    PREDICT(POP_JUMP_IF_FALSE);
    PREDICT(POP_JUMP_IF_TRUE);
    DISPATCH();
}

This pops the values from the stack (technically it only pops one)

PyObject *right = POP();
PyObject *left = TOP();

and runs the compare:

PyObject *res = cmp_outcome(oparg, left, right);

cmp_outcome is this:

static PyObject *
cmp_outcome(int op, PyObject *v, PyObject *w)
{
    int res = 0;
    switch (op) {
    case PyCmp_IS: ...
    case PyCmp_IS_NOT: ...
    case PyCmp_IN:
        res = PySequence_Contains(w, v);
        if (res < 0)
            return NULL;
        break;
    case PyCmp_NOT_IN: ...
    case PyCmp_EXC_MATCH: ...
    default:
        return PyObject_RichCompare(v, w, op);
    }
    v = res ? Py_True : Py_False;
    Py_INCREF(v);
    return v;
}

This is where the paths split. The PyCmp_IN branch does

int
PySequence_Contains(PyObject *seq, PyObject *ob)
{
    Py_ssize_t result;
    PySequenceMethods *sqm = seq->ob_type->tp_as_sequence;
    if (sqm != NULL && sqm->sq_contains != NULL)
        return (*sqm->sq_contains)(seq, ob);
    result = _PySequence_IterSearch(seq, ob, PY_ITERSEARCH_CONTAINS);
    return Py_SAFE_DOWNCAST(result, Py_ssize_t, int);
}

Note that a tuple is defined as

static PySequenceMethods tuple_as_sequence = {
    ...
    (objobjproc)tuplecontains,                  /* sq_contains */
};

PyTypeObject PyTuple_Type = {
    ...
    &tuple_as_sequence,                         /* tp_as_sequence */
    ...
};

So the branch

if (sqm != NULL && sqm->sq_contains != NULL)

will be taken and *sqm->sq_contains, which is the function (objobjproc)tuplecontains, will be taken.

This does

static int
tuplecontains(PyTupleObject *a, PyObject *el)
{
    Py_ssize_t i;
    int cmp;

    for (i = 0, cmp = 0 ; cmp == 0 && i < Py_SIZE(a); ++i)
        cmp = PyObject_RichCompareBool(el, PyTuple_GET_ITEM(a, i),
                                           Py_EQ);
    return cmp;
}

...Wait, wasn't that PyObject_RichCompareBool what the other branch took? Nope, that was PyObject_RichCompare.

That code path was short so it likely just comes down to the speed of these two. Let's compare.

int
PyObject_RichCompareBool(PyObject *v, PyObject *w, int op)
{
    PyObject *res;
    int ok;

    /* Quick result when objects are the same.
       Guarantees that identity implies equality. */
    if (v == w) {
        if (op == Py_EQ)
            return 1;
        else if (op == Py_NE)
            return 0;
    }

    ...
}

The code path in PyObject_RichCompareBool pretty much immediately terminates. For PyObject_RichCompare, it does

PyObject *
PyObject_RichCompare(PyObject *v, PyObject *w, int op)
{
    PyObject *res;

    assert(Py_LT <= op && op <= Py_GE);
    if (v == NULL || w == NULL) { ... }
    if (Py_EnterRecursiveCall(" in comparison"))
        return NULL;
    res = do_richcompare(v, w, op);
    Py_LeaveRecursiveCall();
    return res;
}

The Py_EnterRecursiveCall/Py_LeaveRecursiveCall combo are not taken in the previous path, but these are relatively quick macros that'll short-circuit after incrementing and decrementing some globals.

do_richcompare does:

static PyObject *
do_richcompare(PyObject *v, PyObject *w, int op)
{
    richcmpfunc f;
    PyObject *res;
    int checked_reverse_op = 0;

    if (v->ob_type != w->ob_type && ...) { ... }
    if ((f = v->ob_type->tp_richcompare) != NULL) {
        res = (*f)(v, w, op);
        if (res != Py_NotImplemented)
            return res;
        ...
    }
    ...
}

This does some quick checks to call v->ob_type->tp_richcompare which is

PyTypeObject PyUnicode_Type = {
    ...
    PyUnicode_RichCompare,      /* tp_richcompare */
    ...
};

which does

PyObject *
PyUnicode_RichCompare(PyObject *left, PyObject *right, int op)
{
    int result;
    PyObject *v;

    if (!PyUnicode_Check(left) || !PyUnicode_Check(right))
        Py_RETURN_NOTIMPLEMENTED;

    if (PyUnicode_READY(left) == -1 ||
        PyUnicode_READY(right) == -1)
        return NULL;

    if (left == right) {
        switch (op) {
        case Py_EQ:
        case Py_LE:
        case Py_GE:
            /* a string is equal to itself */
            v = Py_True;
            break;
        case Py_NE:
        case Py_LT:
        case Py_GT:
            v = Py_False;
            break;
        default:
            ...
        }
    }
    else if (...) { ... }
    else { ...}
    Py_INCREF(v);
    return v;
}

Namely, this shortcuts on left == right... but only after doing

    if (!PyUnicode_Check(left) || !PyUnicode_Check(right))

    if (PyUnicode_READY(left) == -1 ||
        PyUnicode_READY(right) == -1)

All in all the paths then look something like this (manually recursively inlining, unrolling and pruning known branches)

POP()                           # Stack stuff
TOP()                           #
                                #
case PyCmp_IN:                  # Dispatch on operation
                                #
sqm != NULL                     # Dispatch to builtin op
sqm->sq_contains != NULL        #
*sqm->sq_contains               #
                                #
cmp == 0                        # Do comparison in loop
i < Py_SIZE(a)                  #
v == w                          #
op == Py_EQ                     #
++i                             # 
cmp == 0                        #
                                #
res < 0                         # Convert to Python-space
res ? Py_True : Py_False        #
Py_INCREF(v)                    #
                                #
Py_DECREF(left)                 # Stack stuff
Py_DECREF(right)                #
SET_TOP(res)                    #
res == NULL                     #
DISPATCH()                      #

vs

POP()                           # Stack stuff
TOP()                           #
                                #
default:                        # Dispatch on operation
                                #
Py_LT <= op                     # Checking operation
op <= Py_GE                     #
v == NULL                       #
w == NULL                       #
Py_EnterRecursiveCall(...)      # Recursive check
                                #
v->ob_type != w->ob_type        # More operation checks
f = v->ob_type->tp_richcompare  # Dispatch to builtin op
f != NULL                       #
                                #
!PyUnicode_Check(left)          # ...More checks
!PyUnicode_Check(right))        #
PyUnicode_READY(left) == -1     #
PyUnicode_READY(right) == -1    #
left == right                   # Finally, doing comparison
case Py_EQ:                     # Immediately short circuit
Py_INCREF(v);                   #
                                #
res != Py_NotImplemented        #
                                #
Py_LeaveRecursiveCall()         # Recursive check
                                #
Py_DECREF(left)                 # Stack stuff
Py_DECREF(right)                #
SET_TOP(res)                    #
res == NULL                     #
DISPATCH()                      #

Now, PyUnicode_Check and PyUnicode_READY are pretty cheap since they only check a couple of fields, but it should be obvious that the top one is a smaller code path, it has fewer function calls, only one switch statement and is just a bit thinner.

TL;DR:

Both dispatch to if (left_pointer == right_pointer); the difference is just how much work they do to get there. in just does less.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Why is `-1 * x` faster than `-x` and why?

From Java

Why is list(x for x in a) faster for a=[0] than for a=[]?

From Java

Why is 2 * x * x faster than 2 * ( x * x ) in Python 3.x, for integers?

From Dev

Why is 2 * x * x faster than 2 * ( x * x ) in Python 3.x, for integers?

From Java

Why is x**4.0 faster than x**4 in Python 3?

From Dev

Why is x**4.0 faster than x**4 in Python 3?

From Dev

Why this subquery is 10x faster than inner join

From Dev

Why is this NodeJS 2x faster than native C?

From Dev

Why this subquery is 10x faster than inner join

From Dev

Why is Java faster with a function using an external variable containing X rather than using X directly?

From Dev

Why is function composition from left to right 11x to 19x faster than right to left?

From Dev

Why does Windows 7 x64 work faster than an x86 edition on my PC?

From Dev

Why x64 bit version is way more faster than x32?

From Dev

Why is a += x slower than a = a+x?

From Dev

Write is faster than read on x86?

From Dev

Printing to an X terminal faster than printing to tty?

From Dev

What is faster, `if x` or `if x != 0`?

From Dev

x86_64: is IMUL faster than 2x SHL + 2x ADD?

From Dev

PHP: if greater than x, then x

From Dev

In OS X, why does using println() cause my program to run faster than without println()

From Dev

Why simple Scala tailrec loop for fibonacci calculation is faster in 3x times than Java loop?

From Dev

Why is dict definition faster in Python 2.7 than in Python 3.x?

From Dev

Why is this usage of concatentation in StringBuilder's constructor 100X faster than calling append()?

From Dev

Why is the F# version of this program 6x faster than the Haskell one?

From Dev

Why is the sum of reciprocals using a for-loop ~400x faster than streams?

From Dev

Why is “/dev/rdisk” about 20 times faster than “/dev/disk” in Mac OS X

From Dev

Why is Linux 30x faster than Windows 10 in Copying files?

From Dev

Why is my inclusive scan code 2x faster on CPU than on a GPU?

From Dev

Why Windows download under VirtualBox is 20x faster than in native Linux?

Related Related

  1. 1

    Why is `-1 * x` faster than `-x` and why?

  2. 2

    Why is list(x for x in a) faster for a=[0] than for a=[]?

  3. 3

    Why is 2 * x * x faster than 2 * ( x * x ) in Python 3.x, for integers?

  4. 4

    Why is 2 * x * x faster than 2 * ( x * x ) in Python 3.x, for integers?

  5. 5

    Why is x**4.0 faster than x**4 in Python 3?

  6. 6

    Why is x**4.0 faster than x**4 in Python 3?

  7. 7

    Why this subquery is 10x faster than inner join

  8. 8

    Why is this NodeJS 2x faster than native C?

  9. 9

    Why this subquery is 10x faster than inner join

  10. 10

    Why is Java faster with a function using an external variable containing X rather than using X directly?

  11. 11

    Why is function composition from left to right 11x to 19x faster than right to left?

  12. 12

    Why does Windows 7 x64 work faster than an x86 edition on my PC?

  13. 13

    Why x64 bit version is way more faster than x32?

  14. 14

    Why is a += x slower than a = a+x?

  15. 15

    Write is faster than read on x86?

  16. 16

    Printing to an X terminal faster than printing to tty?

  17. 17

    What is faster, `if x` or `if x != 0`?

  18. 18

    x86_64: is IMUL faster than 2x SHL + 2x ADD?

  19. 19

    PHP: if greater than x, then x

  20. 20

    In OS X, why does using println() cause my program to run faster than without println()

  21. 21

    Why simple Scala tailrec loop for fibonacci calculation is faster in 3x times than Java loop?

  22. 22

    Why is dict definition faster in Python 2.7 than in Python 3.x?

  23. 23

    Why is this usage of concatentation in StringBuilder's constructor 100X faster than calling append()?

  24. 24

    Why is the F# version of this program 6x faster than the Haskell one?

  25. 25

    Why is the sum of reciprocals using a for-loop ~400x faster than streams?

  26. 26

    Why is “/dev/rdisk” about 20 times faster than “/dev/disk” in Mac OS X

  27. 27

    Why is Linux 30x faster than Windows 10 in Copying files?

  28. 28

    Why is my inclusive scan code 2x faster on CPU than on a GPU?

  29. 29

    Why Windows download under VirtualBox is 20x faster than in native Linux?

HotTag

Archive