Why simple Scala tailrec loop for fibonacci calculation is faster in 3x times than Java loop?

Andriy Plokhotnyuk

Scala

code:

@annotation.tailrec
private def fastLoop(n: Int, a: Long = 0, b: Long = 1): Long = 
  if (n > 1) fastLoop(n - 1, b, a + b) else b

bytecode:

  private long fastLoop(int, long, long);
    Code:
       0: iload_1
       1: iconst_1
       2: if_icmple     21
       5: iload_1
       6: iconst_1
       7: isub
       8: lload         4
      10: lload_2
      11: lload         4
      13: ladd
      14: lstore        4
      16: lstore_2
      17: istore_1
      18: goto          0
      21: lload         4
      23: lreturn

result is 53879289.462 ± 6289454.961 ops/s:

https://travis-ci.org/plokhotnyuk/scala-vs-java/jobs/56117116#L2909

Java

code:

private long fastLoop(int n, long a, long b) {
    while (n > 1) {
        long c = a + b;
        a = b;
        b = c;
        n--;
    }
    return b;
}

bytecode:

  private long fastLoop(int, long, long);
    Code:
       0: iload_1
       1: iconst_1
       2: if_icmple     24
       5: lload_2
       6: lload         4
       8: ladd
       9: lstore        6
      11: lload         4
      13: lstore_2
      14: lload         6
      16: lstore        4
      18: iinc          1, -1
      21: goto          0
      24: lload         4
      26: lreturn

result is 17444340.812 ± 9508030.117 ops/s:

https://travis-ci.org/plokhotnyuk/scala-vs-java/jobs/56117116#L2881

Yes, it depends on environment parameters (JDK version, CPU model & frequency of RAM) and dynamic state. But why mostly the same bytecode on the same environment can produce stable 2x-3x difference for range of function arguments?

Here is list of ops/s numbers for different values of function arguments from my notebook with Intel(R) Core(TM) i7-2640M CPU @ 2.80GHz (max 3.50GHz), RAM 12Gb DDR3-1333, Ubuntu 14.10, Oracle JDK 1.8.0_40-b25 64-bit:

[info] Benchmark            (n)   Mode  Cnt          Score          Error  Units
[info] JavaFibonacci.loop     2  thrpt    5  171776163.027 ±  4620419.353  ops/s
[info] JavaFibonacci.loop     4  thrpt    5  144793748.362 ± 25506649.671  ops/s
[info] JavaFibonacci.loop     8  thrpt    5   67271848.598 ± 15133193.309  ops/s
[info] JavaFibonacci.loop    16  thrpt    5   54552795.336 ± 17398924.190  ops/s
[info] JavaFibonacci.loop    32  thrpt    5   41156886.101 ± 12905023.289  ops/s
[info] JavaFibonacci.loop    64  thrpt    5   24407771.671 ±  4614357.030  ops/s
[info] ScalaFibonacci.loop    2  thrpt    5  148926292.076 ± 23673126.125  ops/s
[info] ScalaFibonacci.loop    4  thrpt    5  139184195.527 ± 30616384.925  ops/s
[info] ScalaFibonacci.loop    8  thrpt    5  109050091.514 ± 23506756.224  ops/s
[info] ScalaFibonacci.loop   16  thrpt    5   81290743.288 ±  5214733.740  ops/s
[info] ScalaFibonacci.loop   32  thrpt    5   38937420.431 ±  8324732.107  ops/s
[info] ScalaFibonacci.loop   64  thrpt    5   22641295.988 ±  5961435.507  ops/s

Additional question is "why values of ops/s are decreasing in non-linear way as above?"

Andriy Plokhotnyuk

Yes, I was wrong, and missed that tested method was not just fastLoop calls:

Scala

  @Benchmark
  def loop(): BigInt =
    if (n > 92) loop(n - 91, 4660046610375530309L, 7540113804746346429L)
    else fastLoop(n)

Java

 @Benchmark
    public BigInteger loop() {
        return n > 92 ?
                loop(n - 91, BigInteger.valueOf(4660046610375530309L), BigInteger.valueOf(7540113804746346429L)) :
                BigInteger.valueOf(fastLoop(n, 0, 1));
    }

As Aleksey noted lot of time was spend in conversions from Long/long to BigInt/BigInteger.

I have wrote separate benchmarks which tests just fastLoop(n, 0, 1) call. Here are results from them:

[info] JavaFibonacci.fastLoop     2  thrpt    5  338071686.910 ± 66146042.535  ops/s
[info] JavaFibonacci.fastLoop     4  thrpt    5  231066635.073 ±  3702419.585  ops/s
[info] JavaFibonacci.fastLoop     8  thrpt    5  174832245.690 ± 36491363.939  ops/s
[info] JavaFibonacci.fastLoop    16  thrpt    5   95162799.968 ± 16151609.596  ops/s
[info] JavaFibonacci.fastLoop    32  thrpt    5   60197918.766 ± 10662747.434  ops/s
[info] JavaFibonacci.fastLoop    64  thrpt    5   29564087.602 ±  3610164.011  ops/s
[info] ScalaFibonacci.fastLoop    2  thrpt    5  336588218.560 ± 56762496.725  ops/s
[info] ScalaFibonacci.fastLoop    4  thrpt    5  224918874.670 ± 35499107.133  ops/s
[info] ScalaFibonacci.fastLoop    8  thrpt    5  121952667.394 ± 17314931.711  ops/s
[info] ScalaFibonacci.fastLoop   16  thrpt    5   96573968.960 ± 12757890.175  ops/s
[info] ScalaFibonacci.fastLoop   32  thrpt    5   59462408.940 ± 14924369.138  ops/s
[info] ScalaFibonacci.fastLoop   64  thrpt    5   28922994.377 ±  7209467.197  ops/s

Lessons that I learned:

  • Scala implicits can eat lot of performance, while are easy to be overlooked;

  • Cashing of BigInt values in Scala can speed up some functions comparing with Java's BigInteger.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Why is this simple loop faster in Go than in C?

From Dev

Why is the calculation of fibonacci faster in javascript than haskell?

From Dev

Javascript is 100 faster than Classical C in simple for loop test, why?

From Dev

Why is PHP7 so much faster than Python3 in executing this simple loop?

From Dev

why is for loop faster than while loop

From Dev

Why is Haskell faster than C++ for a simple fibonacci

From Dev

Why is my foreach faster than my for loop?

From Dev

Why Haskell list comprehension is faster than loop

From Dev

Is enumerateObjectsUsingBlock: faster than a for-in loop? Why?

From Dev

why is using .index() faster than using a for loop?

From Java

Java: manually-unrolled loop is still faster than the original loop. Why?

From Dev

Java: manually-unrolled loop is still faster than the original loop. Why?

From Dev

Why is a 'for' loop faster than a 'while' loop for reading from a file?

From Dev

Why is for loop on native python list faster than for loop on numpy array

From Dev

Java benchmarking - why is the second loop faster?

From Dev

Why is the sum of reciprocals using a for-loop ~400x faster than streams?

From Java

Why are elementwise additions much faster in separate loops than in a combined loop?

From Dev

Why is memcmp so much faster than a for loop check?

From Dev

Why is the second loop over a static array in the BSS faster than the first?

From Dev

why tail recursive gcd is faster than while loop with rubinius

From Dev

Why is while loop much faster than range in this case?

From Dev

Why is a for loop faster than Array.prototype.map()

From Dev

Why is memcmp so much faster than a for loop check?

From Dev

Why is a for loop faster than Array.prototype.map()

From Dev

loop executing 3 times java

From Dev

Why is a big loop within a small loop faster than a small loop within a big one?

From Dev

Why is a big loop within a small loop faster than a small loop within a big one?

From Dev

R for loop faster than sapply

From Dev

A solution works faster than a for loop

Related Related

  1. 1

    Why is this simple loop faster in Go than in C?

  2. 2

    Why is the calculation of fibonacci faster in javascript than haskell?

  3. 3

    Javascript is 100 faster than Classical C in simple for loop test, why?

  4. 4

    Why is PHP7 so much faster than Python3 in executing this simple loop?

  5. 5

    why is for loop faster than while loop

  6. 6

    Why is Haskell faster than C++ for a simple fibonacci

  7. 7

    Why is my foreach faster than my for loop?

  8. 8

    Why Haskell list comprehension is faster than loop

  9. 9

    Is enumerateObjectsUsingBlock: faster than a for-in loop? Why?

  10. 10

    why is using .index() faster than using a for loop?

  11. 11

    Java: manually-unrolled loop is still faster than the original loop. Why?

  12. 12

    Java: manually-unrolled loop is still faster than the original loop. Why?

  13. 13

    Why is a 'for' loop faster than a 'while' loop for reading from a file?

  14. 14

    Why is for loop on native python list faster than for loop on numpy array

  15. 15

    Java benchmarking - why is the second loop faster?

  16. 16

    Why is the sum of reciprocals using a for-loop ~400x faster than streams?

  17. 17

    Why are elementwise additions much faster in separate loops than in a combined loop?

  18. 18

    Why is memcmp so much faster than a for loop check?

  19. 19

    Why is the second loop over a static array in the BSS faster than the first?

  20. 20

    why tail recursive gcd is faster than while loop with rubinius

  21. 21

    Why is while loop much faster than range in this case?

  22. 22

    Why is a for loop faster than Array.prototype.map()

  23. 23

    Why is memcmp so much faster than a for loop check?

  24. 24

    Why is a for loop faster than Array.prototype.map()

  25. 25

    loop executing 3 times java

  26. 26

    Why is a big loop within a small loop faster than a small loop within a big one?

  27. 27

    Why is a big loop within a small loop faster than a small loop within a big one?

  28. 28

    R for loop faster than sapply

  29. 29

    A solution works faster than a for loop

HotTag

Archive