Faster alternative to `range(which(..))`

Remi.b

Let be a sequence of TRUE and FALSE in R

v = c(F,F,F,F,F,F,T,F,T,T,F,T,T,T,T,T,F,T,F,T,T,F,F,F,T,F,F,F,F,F)

I would like to get the the positions of the first and the last TRUE. One way to achieve this is

range(which(v)) # 7 25

but this solution is relatively slow as it must check every element of the vector to get the position of each TRUE and then loop over all positions, evaluating two if statements at each position (I think) in order to get the maximum and the minimum values. It would be much more strategic to search for the first TRUE starting one from the beginning and one from the end and just return those positions.

Is there a faster alternative to range(which(..))?

josliber

The simplest approach I can think of that doesn't involve searching the entire vector would be an Rcpp solution:

library(Rcpp)
cppFunction(
"NumericVector rangeWhich(LogicalVector x) {
  NumericVector ret(2, NumericVector::get_na());
  int n = x.size();
  for (int idx=0; idx < n; ++idx) {
    if (x[idx]) {
      ret[0] = idx+1;  // 1-indexed for R
      break;
    }
  }
  if (R_IsNA(ret[0]))  return ret;  // No true values
  for (int idx=n-1; idx >= 0; --idx) {
    if (x[idx]) {
      ret[1] = idx + 1;  // 1-indexed for R
      break;
    }
  }
  return ret;
}")
rangeWhich(v)
# [1]  7 25

We can benchmark on a fairly long vector (length 1 million) with random entries. We would expect to get pretty large efficiency gains from not searching through the whole thing with which:

set.seed(144)
bigv <- sample(c(F, T), 1000000, replace=T)
library(microbenchmark)
# range_find from @PierreLafortune
range_find <- function(v) {
i <- 1
while(!v[i]) {
  i <- i +1
}
j <- length(v)
while(!v[j]) {
  j <- j-1
}
c(i,j)
}
# shortCircuit from @JoshuaUlrich
shortCircuit <- compiler::cmpfun({
  function(x) {
    first <- 1
    while(TRUE) if(x[first]) break else first <- first+1
    last <- length(x)
    while(TRUE) if(x[last]) break else last <- last-1
    c(first, last)
  }
})
microbenchmark(rangeWhich(bigv), range_find(bigv), shortCircuit(bigv), range(which(bigv)))
# Unit: microseconds
#                expr      min        lq        mean     median         uq       max neval
#    rangeWhich(bigv)    1.476    2.4655     9.45051     9.0640    13.7585    46.286   100
#    range_find(bigv)    1.445    2.2930     8.06993     7.2055    11.8980    26.893   100
#  shortCircuit(bigv)    1.114    1.6920     7.30925     7.0440    10.2210    30.758   100
#  range(which(bigv)) 6821.180 9389.1465 13991.84613 10007.9045 16698.2230 58112.490   100

The Rcpp solution is a good deal faster (more than 500x faster) than max(which(v)) because it doesn't need to iterate through the whole vector with which. For this example it has a near-identical runtime (in fact, slightly slower) than range_find from @PierreLafortune and shortCircuit from @JoshuaUlrich.

Using Joshua's excellent example of some worst-case behavior where the true value is in the very middle of the vector (I'm repeating his experiment with all proposed functions so we can see the whole picture), we see a very different situation:

bigv2 <- rep(FALSE, 1e6)
bigv2[5e5-1] <- TRUE
bigv2[5e5+1] <- TRUE
microbenchmark(rangeWhich(bigv2), range_find(bigv2), shortCircuit(bigv2), range(which(bigv2)))
# Unit: microseconds
#                 expr        min          lq        mean      median         uq        max neval
#    rangeWhich(bigv2)    546.206    555.3820    593.1385    575.3790    599.055    979.924   100
#    range_find(bigv2) 400057.083 406449.0075 434515.1142 411881.4145 427487.041 697529.163   100
#  shortCircuit(bigv2)  74942.612  75663.7835  79095.3795  76761.5325  79703.265 125054.360   100
#  range(which(bigv2))    632.086    679.0955    761.9610    700.1365    746.509   3924.941   100

For this vector the looping base R solutions are much slower than the original solution (100-600x slower) and the Rcpp solution is barely faster than range(which(bigv2)) (which makes sense, because they're both looping through the whole vector once).

As usual, this needs to come with a disclaimer -- you need to compile your Rcpp function, which also takes time, so this will only be a benefit if you have very large vectors or are repeating this operation many times. From the comments on your question it sounds like you indeed have a large number of large vectors, so this could be a good option for you.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Java

Is there is a faster alternative for Integer.toString(myInt).getBytes(US_ASCII)?

From Java

What is a faster alternative to Python's http.server (or SimpleHTTPServer)?

From Java

Faster alternative to nested loops?

From Dev

A faster alternative to Pandas `isin` function

From Dev

Java: Faster alternative to String(byte[])

From Dev

Faster alternative to Series.add function in pandas

From Dev

Can matlab's int8 function be replaced with faster alternative

From Dev

alternative (faster) war to 3 nested for loop python

From Dev

faster alternative to InttoStr/StrToInt?

From Dev

ffmpeg live transcoding faster alternative?

From Dev

Faster alternative to grouby/shift

From Dev

Faster alternative to function 'rollapply'

From Dev

Alternative to using loop to make code run faster

From Dev

Python faster alternative to dictionary?

From Dev

Faster alternative to INTERSECT with 'rows' - MATLAB

From Dev

R: faster alternative of period.apply

From Dev

Faster alternative for getPixel and getPixel in Android Bitmap?

From Dev

Faster alternative to for loop in for loop

From Dev

Python: faster alternative for itertools.product()?

From Dev

Faster alternative to decimal.Parse

From Dev

Faster alternative to STL std::map

From Dev

R - faster alternative to hist(XX, plot=FALSE)$count

From Dev

Is there a faster alternative to cp for copying large files (~20 GB)?

From Dev

Faster alternative to iterrows

From Dev

Faster alternative to perform pandas groupby operation

From Dev

Faster alternative to split-apply-combine

From Dev

Faster du/stat alternative for directories

From Dev

Python's faster alternative to lists for prime numbers?

From Dev

A faster alternative to .SpecialCells(xlCellTypeVisible).Copy

Related Related

  1. 1

    Is there is a faster alternative for Integer.toString(myInt).getBytes(US_ASCII)?

  2. 2

    What is a faster alternative to Python's http.server (or SimpleHTTPServer)?

  3. 3

    Faster alternative to nested loops?

  4. 4

    A faster alternative to Pandas `isin` function

  5. 5

    Java: Faster alternative to String(byte[])

  6. 6

    Faster alternative to Series.add function in pandas

  7. 7

    Can matlab's int8 function be replaced with faster alternative

  8. 8

    alternative (faster) war to 3 nested for loop python

  9. 9

    faster alternative to InttoStr/StrToInt?

  10. 10

    ffmpeg live transcoding faster alternative?

  11. 11

    Faster alternative to grouby/shift

  12. 12

    Faster alternative to function 'rollapply'

  13. 13

    Alternative to using loop to make code run faster

  14. 14

    Python faster alternative to dictionary?

  15. 15

    Faster alternative to INTERSECT with 'rows' - MATLAB

  16. 16

    R: faster alternative of period.apply

  17. 17

    Faster alternative for getPixel and getPixel in Android Bitmap?

  18. 18

    Faster alternative to for loop in for loop

  19. 19

    Python: faster alternative for itertools.product()?

  20. 20

    Faster alternative to decimal.Parse

  21. 21

    Faster alternative to STL std::map

  22. 22

    R - faster alternative to hist(XX, plot=FALSE)$count

  23. 23

    Is there a faster alternative to cp for copying large files (~20 GB)?

  24. 24

    Faster alternative to iterrows

  25. 25

    Faster alternative to perform pandas groupby operation

  26. 26

    Faster alternative to split-apply-combine

  27. 27

    Faster du/stat alternative for directories

  28. 28

    Python's faster alternative to lists for prime numbers?

  29. 29

    A faster alternative to .SpecialCells(xlCellTypeVisible).Copy

HotTag

Archive