Rの/別の列に基づいていくつかの列の値を置き換える方法は？

debugcn 投稿 Dev

アーミン

私は新しいRユーザーであり、コードをより効率的にしようとしています。

私はいくつかの列を数える非常に巨大なデータフレームを持っています。別の列の値に基づいて、いくつかの列の値を置き換えようとしています。

条件文やループでそれを行う方法は知っていますが、データが大きいので可能な限り最適化したいと思います。

いくつかのテストデータを持ってみましょう：

# data.frame creation function
make_d <- 
  function(n_rows = 5000000){
    d <- 
      data.frame(
        "col_1" = sample(   0:3, n_rows, replace = TRUE), 
        "col_2" = sample(1:1000, n_rows, replace = TRUE), 
        "col_3" = sample(1:1000, n_rows, replace = TRUE), 
        "col_4" = sample(1:1000, n_rows, replace = TRUE), 
        "col_5" = sample(1:1000, n_rows, replace = TRUE), 
        "col_6" = sample(1:1000, n_rows, replace = TRUE), 
        "col_7" = sample(1:1000, n_rows, replace = TRUE), 
        "col_8" = sample(1:1000, n_rows, replace = TRUE), 
        "col_9" = sample(1:1000, n_rows, replace = TRUE)
      )
    # return
    d
  }

# create data.frame
d <- make_d()

# first lines of data.frame
head(d)
##   col_1 col_2 col_3 col_4 col_5 col_6 col_7 col_8 col_9
## 1     3    94   802   960   460   346   212   387   665
## 2     0   637   443   249     0     0     0     0     0
## 3     2    26   192   438   562   487   623   604   853
## 4     0   421   667   511     0     0     0     0     0
## 5     3   726   994    58   384   700   307   885   832
## 6     1   567   798   185   117   394   894   745   134

からの列が欲しいのですが...

col1が0に等しい場合col5からcol9が0に等しい
col1が3に等しい場合col2からcol9が0に等しい
col1が2に等しい場合col7およびcol9が0に等しい場合

私がこれまで試したことはあまり効率的ではありませんでした。複数の列を同時に実行したり、回避したりすることができませんでしたif_else()。

library(microbenchmark)
library(dplyr)

microbenchmark(
  setup = { d <- make_d() },
  dplyr_mutate = {
      d <- 
        d %>% 
        mutate(
          col_5 = if_else(col_1 == 0, 0L, col_5),
          col_6 = if_else(col_1 == 0, 0L, col_6),
          col_7 = if_else(col_1 == 0, 0L, col_7),
          col_8 = if_else(col_1 == 0, 0L, col_8),
          col_9 = if_else(col_1 == 0, 0L, col_9), 


          col_2 = if_else(col_1 == 3, 0L, col_2),
          col_3 = if_else(col_1 == 3, 0L, col_3),
          col_4 = if_else(col_1 == 3, 0L, col_4),
          col_5 = if_else(col_1 == 3, 0L, col_5),
          col_6 = if_else(col_1 == 3, 0L, col_6),
          col_7 = if_else(col_1 == 3, 0L, col_7),
          col_8 = if_else(col_1 == 3, 0L, col_8),
          col_9 = if_else(col_1 == 3, 0L, col_9),

          col_7 = if_else(col_1 == 2, 0L, col_7), 
          col_9 = if_else(col_1 == 2, 0L, col_9)
        )},
  times = 10
)

## Unit: milliseconds
##          expr      min       lq    mean   median       uq      max neval
##  dplyr_mutate 412.3384 429.2278 531.884 538.8701 562.7804 793.9565    10

ジェイソンマシューズ

私がそれを正しく理解しているなら、これはあなたが探しているものですか？

スピードアップ：〜1.3x

library(microbenchmark)
library(dplyr)

microbenchmark(
  setup = { d <- make_d() },
  dplyr_mutate_at = 
  {
    d %>%
      mutate_at(vars(col_5:col_9) , funs(ifelse(col_1 == 0, 0,. ))) %>%
      mutate_at(vars(col_2:col_9) , funs(ifelse(col_1 == 3, 0,. ))) %>%
      mutate_at(vars(col_7,col_9) , funs(ifelse(col_1 == 2, 0,. )))
  },

  times = 10
)

##    Unit: milliseconds
##                  expr      min       lq     mean   median       uq      max neval
##          dplyr_mutate 395.5998 423.7178 496.1036 436.8839 551.8601 859.9627    10
##       dplyr_mutate_at 365.0635 378.3087 404.1069 392.1462 400.7426 551.8507    10

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]