Most efficient way to calculate pairwise partial correlations in base R?

catastrophic-failure

Question title says it all, what's the most efficient way to calculate the pairwise partial correlation between every column of a matrix controlling for every other variable?

Basically, something analogous to the cor function below, but resulting in the partial correlations instead of simple correlations.

#> cor(iris[,-5])
#             Sepal.Length Sepal.Width Petal.Length Petal.Width
#Sepal.Length    1.0000000  -0.1175698    0.8717538   0.8179411
#Sepal.Width    -0.1175698   1.0000000   -0.4284401  -0.3661259
#Petal.Length    0.8717538  -0.4284401    1.0000000   0.9628654
#Petal.Width     0.8179411  -0.3661259    0.9628654   1.0000000

The result should match the one we obtain with the ppcor library:

#> ppcor::pcor(iris[,-5])$estimate
#             Sepal.Length Sepal.Width Petal.Length Petal.Width
#Sepal.Length    1.0000000   0.6285707    0.7190656  -0.3396174
#Sepal.Width     0.6285707   1.0000000   -0.6152919   0.3526260
#Petal.Length    0.7190656  -0.6152919    1.0000000   0.8707698
#Petal.Width    -0.3396174   0.3526260    0.8707698   1.0000000

Solutions on other partial correlation coefficients (i.e. not Pearson) are welcome as well.

catastrophic-failure

We know the pairwise partial correlations controlling for every other variable can be obtained through the inversion of the correlation or covariance matrix (see here) in O(n^3) time. So a possible solution is simply:

pcor.solve = function(x){
  res = solve(cov(x))
  res = -res/sqrt(diag(res) %o% diag(res))
  diag(res) = 1
  return(res)
}

This is basically a stripped-down version of ppcor::pcor. The result is:

pcor.solve(iris[,-5])
#             Sepal.Length Sepal.Width Petal.Length Petal.Width
#Sepal.Length    1.0000000   0.6285707    0.7190656  -0.3396174
#Sepal.Width     0.6285707   1.0000000   -0.6152919   0.3526260
#Petal.Length    0.7190656  -0.6152919    1.0000000   0.8707698
#Petal.Width    -0.3396174   0.3526260    0.8707698   1.0000000

Note however the covariance matrix (or the correlation matrix, the result is the same) must be positive definite.


As this bogs down mostly to an effective inversion operation, I've looked at this thread in stats.SE. qr.solve and chol2inv can be used in covariance matrices to the same effect.

pcor.qr = function(x){
  res = qr.solve(cov(x))
  res = -res/sqrt(diag(res) %o% diag(res))
  diag(res) = 1
  dimnames(res)[[1]] = dimnames(res)[[2]] = colnames(x)
  return(res)
}
pcor.qr(iris[,-5])
#             Sepal.Length Sepal.Width Petal.Length Petal.Width
#Sepal.Length    1.0000000   0.6285707    0.7190656  -0.3396174
#Sepal.Width     0.6285707   1.0000000   -0.6152919   0.3526260
#Petal.Length    0.7190656  -0.6152919    1.0000000   0.8707698
#Petal.Width    -0.3396174   0.3526260    0.8707698   1.0000000

pcor.chol = function(x){
  res = chol2inv(chol(cov(x)))
  res = -res/sqrt(diag(res) %o% diag(res))
  diag(res) = 1
  dimnames(res)[[1]] = dimnames(res)[[2]] = colnames(x)
  return(res)
}
pcor.chol(iris[,-5])
#             Sepal.Length Sepal.Width Petal.Length Petal.Width
#Sepal.Length    1.0000000   0.6285707    0.7190656  -0.3396174
#Sepal.Width     0.6285707   1.0000000   -0.6152919   0.3526260
#Petal.Length    0.7190656  -0.6152919    1.0000000   0.8707698
#Petal.Width    -0.3396174   0.3526260    0.8707698   1.0000000

UPDATE:

SVD can also be used to solve. If we have a positive definite square matrix, it's SVD decomposition is A = UDU^T, and it's inverse is simply A^-1 = UD^-1U^T.

pcor.svd = function(x){
  res = svd(cov(x))
  res = res$v %*% diag(1/res$d) %*% t(res$v)
  res = -res/sqrt(diag(res) %o% diag(res))
  diag(res) = 1
  dimnames(res)[[1]] = dimnames(res)[[2]] = colnames(x)
  return(res)
}

pcor.svd(iris[,-5])
#             Sepal.Length Sepal.Width Petal.Length Petal.Width
#Sepal.Length    1.0000000   0.6285707    0.7190656  -0.3396174
#Sepal.Width     0.6285707   1.0000000   -0.6152919   0.3526260
#Petal.Length    0.7190656  -0.6152919    1.0000000   0.8707698
#Petal.Width    -0.3396174   0.3526260    0.8707698   1.0000000

microbenchmark with 10000 repeats:

library(microbenchmark)
#iris
dt1 = iris[,-5]
microbenchmark(
  ppcor = ppcor::pcor(dt1)$estimate,
  solve = pcor.solve(dt1),
  qr = pcor.qr(dt1),
  chol = pcor.chol(dt1),
  svd = pcor.svd(dt1),
  times = 10000L)

#Unit: microseconds
#  expr     min      lq     mean  median      uq        max neval cld
# ppcor 247.728 267.790 314.8356 280.853 296.248 196962.601 10000   c
# solve 176.816 198.743 217.1298 205.274 221.603   2425.964 10000  b 
#    qr 240.264 258.459 282.7005 270.123 285.518   4015.438 10000   c
#  chol 131.562 148.824 163.3567 154.423 167.019   1593.205 10000 a  
#   svd 179.615 199.675 219.2781 208.074 223.469   1920.710 10000  b 

#random data
dt2 = cbind(rnorm(1E4), rnorm(1E4)+2)
microbenchmark(
  ppcor = ppcor::pcor(dt2)$estimate,
  solve = pcor.solve(dt2),
  qr = pcor.qr(dt2),
  chol = pcor.chol(dt2),
  svd = pcor.svd(dt2),
  times = 10000L)

#Unit: microseconds
#  expr     min      lq     mean  median      uq       max neval  cld
# ppcor 243.063 267.323 306.4535 284.585 311.177  1833.936 10000    d
# solve 180.548 190.812 222.6685 198.277 216.004 84776.704 10000 a   
#    qr 229.068 248.662 282.8142 262.658 285.518  1954.301 10000   c 
#  chol 179.148 189.413 212.6551 198.277 216.005  1383.733 10000 a   
#   svd 213.672 230.933 262.5084 243.529 264.058  5261.543 10000  b  

#uncorrelated data
dt3 = cbind(sin(seq(0, 2*pi, length.out = 1000L)), cos(seq(0, 2*pi, length.out = 1000L)))
microbenchmark(
  ppcor = ppcor::pcor(dt3)$estimate,
  solve = pcor.solve(dt3),
  qr = pcor.qr(dt3),
  chol = pcor.chol(dt3),
  svd = pcor.svd(dt3),
  times = 10000L)

#Unit: microseconds
#  expr     min      lq     mean   median      uq      max neval  cld
# ppcor 142.759 162.354 188.7767 172.1500 191.745 2230.021 10000    d
# solve  80.711  89.108 102.8269  92.3740 101.704 1709.372 10000 a   
#    qr 130.629 145.092 168.0627 153.0220 169.351 4914.910 10000   c 
#  chol  79.777  87.709 102.2984  92.3740 101.238 6731.117 10000 a   
#   svd 112.901 127.363 147.1913 134.1285 148.358 1401.928 10000  b  

[UPDATED] Or, in other words, chol < solve < svd < qr < ppcor for now. Probably some speed up can be gained talking advantage of tha fact that the covariance matrix is symmetric (the chol solution uses this fact already), and time could be gained also in the covariance calculation.


Of course, the ppcor library is much more versatile, handling cases where the covariance matrix is not invertible etc, so it's in a disadvantage in comparisons. One can also make the case though that it's desirable to have the simpler solutions when partial correlations are going to be exhaustively computed and one knows the covariance matrices are positive definite.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Most efficient way to calculate nCr modulo 142857

From Dev

Most efficient way to calculate radial profile

From Dev

What is the most efficient way to generate a link in a Rails partial?

From Dev

spark scala most efficient way to do partial string count

From Dev

Most efficient way to randomize a matrix in R or in Python

From Dev

Most elegant/efficient/Pythonic way to calculate multiple parallel arrays?

From Dev

The most efficient way to calculate rankings in real time games

From Dev

Most efficient way to calculate the distance between 2 integer coordinate points?

From Dev

Arrange dataframe for pairwise correlations

From Dev

Pairwise correlations in dataframe

From Dev

Most efficient way to implement this?

From Dev

Most efficient way to execute this

From Dev

How do I calculate the most profit-dense combination in the most efficient way?

From Dev

R: Whats the most efficient way to code this formula in R?

From Dev

Most efficient way to label groups based on sequential values in R

From Dev

Most efficient way to get index in a sorted vector in R?

From Dev

r - Fast and efficient way to calculate time-based moving averages

From Dev

What's the most efficient way to calculate a running total/balance when using pagination (PHP, MySQL)

From Dev

Most efficient way to loop through multiple csv files and calculate NYSE tick

From Dev

What is most efficient way to calculate duration in an EventLog stored in a Massive Varchar Field?

From Dev

What's the most efficient way to calculate a running total/balance when using pagination (PHP, MySQL)

From Dev

What is the most elegant way to calculate seasonal means with R?

From Dev

Most efficient way to compute a polynomial

From Dev

query with calculations most efficient way

From Dev

Most efficient way to split sentence

From Dev

Most efficient way to rename an image

From Dev

Most efficient way to output a newline

From Dev

Most efficient way to loop through '...'

From Dev

Most efficient way to determine an intersection

Related Related

  1. 1

    Most efficient way to calculate nCr modulo 142857

  2. 2

    Most efficient way to calculate radial profile

  3. 3

    What is the most efficient way to generate a link in a Rails partial?

  4. 4

    spark scala most efficient way to do partial string count

  5. 5

    Most efficient way to randomize a matrix in R or in Python

  6. 6

    Most elegant/efficient/Pythonic way to calculate multiple parallel arrays?

  7. 7

    The most efficient way to calculate rankings in real time games

  8. 8

    Most efficient way to calculate the distance between 2 integer coordinate points?

  9. 9

    Arrange dataframe for pairwise correlations

  10. 10

    Pairwise correlations in dataframe

  11. 11

    Most efficient way to implement this?

  12. 12

    Most efficient way to execute this

  13. 13

    How do I calculate the most profit-dense combination in the most efficient way?

  14. 14

    R: Whats the most efficient way to code this formula in R?

  15. 15

    Most efficient way to label groups based on sequential values in R

  16. 16

    Most efficient way to get index in a sorted vector in R?

  17. 17

    r - Fast and efficient way to calculate time-based moving averages

  18. 18

    What's the most efficient way to calculate a running total/balance when using pagination (PHP, MySQL)

  19. 19

    Most efficient way to loop through multiple csv files and calculate NYSE tick

  20. 20

    What is most efficient way to calculate duration in an EventLog stored in a Massive Varchar Field?

  21. 21

    What's the most efficient way to calculate a running total/balance when using pagination (PHP, MySQL)

  22. 22

    What is the most elegant way to calculate seasonal means with R?

  23. 23

    Most efficient way to compute a polynomial

  24. 24

    query with calculations most efficient way

  25. 25

    Most efficient way to split sentence

  26. 26

    Most efficient way to rename an image

  27. 27

    Most efficient way to output a newline

  28. 28

    Most efficient way to loop through '...'

  29. 29

    Most efficient way to determine an intersection

HotTag

Archive