As part of a Monte Carlo simulation, I have to roll a group of dice until certain values show up a certain amount of times. My code that does this calls upon a dice class which generates a random number between 1 and 6, and returns it. Originally the code looked like
public void roll() {
value = (int)(Math.random()*6) + 1;
}
and it wasn't very fast. By exchanging Math.random() for
ThreadLocalRandom.current().nextInt(1, 7);
It ran a section in roughly 60% of the original time, which called this around about 250 million times. As part of the full simulation it will call upon this method billions of times at the very least, so is there any faster way to do this?
Pick a random generator that is as fast and as good as you need it to be, and that isn't slowed down to a tiny fraction of its normal speed by thread safety mechanisms. Then pick a method of generating the [1..6] integer distribution that is a fast and as precise as you need it to be.
The fastest simple generator that is of sufficiently high quality to beat standard tests for PRNGs like TestU01 (instead of failing systematically, like the Mersenne Twister) is Sebastiano Vigna's xorshift64*. I'm showing it as C code but Sebastiano has it in Java as well:
uint64_t xorshift64s (int64_t &x)
{
x ^= x >> 12;
x ^= x << 25;
x ^= x >> 27;
return x * 2685821657736338717ull;
}
Sebastiano Vigna's site has lots of useful info, links and benchmark results. Including papers, for the mathematically inclined.
At that high resolution you can simply use 1 + xorshift64s(state) % 6
and the bias will be immeasurably small. If that is not fast enough, implement the modulo division by multiplication with the inverse. If that is not fast enough - if you cannot afford two MULs per variate - then it gets tricky and you need to come back here. xorshift1024* (Java) plus some bit trickery for the variate would be an option.
Batching - generating an array full of numbers and processing that, then refilling the array and so on - can unlock some speed reserves. Needlessly wrapping things in classes achieves the opposite.
P.S.: if ThreadLocalRandom and xorshift* are not fast enough for your purposes even with batching then you might be going about things in the wrong way, or you might be doing it in the wrong language. Or both.
P.P.S.: in languages like Java (or C#, or Delphi), abstraction is not free, it has a cost. In Java you also have to reckon with things like mandatory gratuitous array bounds checking, unless you have a compiler that can eliminate those checks. Teasing high performance out of a Java program can get very involved... In C++ you get abstraction and performance for free.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments