I'm trying to reduce as much as I can the execution time of a function that sums the output of a sequence of Bernoulli trials.
This is my working-but-slow approach:
set.seed(28100)
sim <- data.frame(result = rep(NA, 10))
for (i in 1:nrow(sim)) {
sim$result[i] <- sum(rbinom(1200, size = 1, prob = 0.2))
}
sim
# result
# 1 268
# 2 230
# 3 223
# 4 242
# 5 224
# 6 218
# 7 237
# 8 254
# 9 227
# 10 247
How could I obtain the same result without a for-loop?
I tried this...
set.seed(28100)
sim <- data.frame(result = rep(sum(rbinom(1200, size = 1, prob = 0.2)), 10))
sim
# result
# 1 269
# 2 269
# 3 269
# 4 269
# 5 269
# 6 269
# 7 269
# 8 269
# 9 269
# 10 269
But clearly the argument of rep()
is executed only once.
The binomial distribution is defined as the sum of Bernoulli trials.
# this line from your question
sum(rbinom(1200, size = 1, prob = 0.2))
# is equivalent to this
rbinom(1, size = 1200, prob = 0.2)
# and replicating it
replicate(expr = sum(rbinom(1200, size = 1, prob = 0.2)), n = 10)
# is equivalent to setting n higher:
### This is the only line of code you need! ####
rbinom(10, size = 1200, prob = 0.2)
It takes about 0.01 seconds for 100,000 simulations and 0.12 seconds for 1M simulations on my (rather slow) laptop.
Modifying @eipi's nice benchmarking, this is about 700-900 times faster than the other methods (now with bug fixes!)
expr min lq mean median uq max neval cld
binom 1.324 1.377 1.607959 1.413 1.931 2.306 10 a
replicate 716.300 737.200 756.288641 749.900 765.300 812.400 10 b
sapply 706.300 743.300 778.863587 763.800 853.500 860.300 10 b
matrixColSums 838.800 870.000 893.813083 894.800 907.500 978.200 10 c
Benchmark code:
nn = 10000
n_bern = 1200
library(microbenchmark)
print(
microbenchmark::microbenchmark(
replicate =
replicate(nn, sum(rbinom(
n_bern, size = 1, prob = 0.2
)))
,
matrixColSums =
colSums(matrix(
rbinom(n_bern * nn, size = 1, prob = 0.2), ncol = nn
)),
sapply = sapply(
1:nn,
FUN = function(x) {
sum(rbinom(n_bern, size = 1, prob = 0.2))
}
),
binom = rbinom(nn, size = n_bern, prob = 0.2),
times = 10
),
order = "median",
signif = 4
)
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments