根据条件选择数据块

Alexander 发表于 Dev

亚力山大

我有一个根据我提供的条件选择数据块的问题。我认为这是一个多步骤的过程，应该在功能上完成，并且可以通过应用于其他数据集lapply。

我有data.frame有19列（但这里的示例数据只有两列）我想首先检查第一列（时间）行，如果它们不在此范围内，它们应该在90和54000范围内。计算完这些块后，计算多少个mag列显示正值和neg / pos值。如果块包含负数，则将其视为切换状态。并给切换率类似（显示切换状态的块总数）/（介于之间的块总数90:54000）
对于满足范围的数据块90:54000，请检查mag以首次观察到该数字<0以及相应的时间

numbers <- c(seq(1,-1,length.out = 601),seq(1,0.98,length.out = 601))
time <- c(seq(90,54144,length.out = 601),seq(90,49850,length.out = 601))
data = data.frame(rep(time,times=12), mag=rep(numbers, times=6))
n <- 90:54000
dfchunk<- split(data, factor(sort(rank(row.names(data))%%n)))
ext_fsw<-lapply(dfchunk,function(x)x[which(x$Mag<0)[1],])
x.n <- data.frame(matrix(unlist(ext_fsw),nrow=n, byrow=T)

真实的数据集如下所示：

V1 V2 V3 V4     V5      V6     V7      V8      V9    V10     V11     V12    V13    V14     V15    V16
1  90  0  0  0 0.0023 -0.0064 0.9987  0.0810  0.0375 0.9814  0.0829  0.0379 0.9803 0.0715  0.0270 0.9823
2 180  0  0  0 0.0023 -0.0064 0.9987  0.0887 -0.0281 0.9818  0.0956 -0.0288 0.9778 0.0796 -0.0469 0.9772
3 270  0  0  0 0.0023 -0.0064 0.9987 -0.0132 -0.0265 0.9776  0.0087 -0.0369 0.9797 0.0311 -0.0004 0.9827
4 360  0  0  0 0.0023 -0.0064 0.9987  0.0843  0.0369 0.9752  0.0765  0.0362 0.9749 0.0632  0.0486 0.9735
5 450  0  0  0 0.0023 -0.0064 0.9987  0.1075 -0.0660 0.9737  0.0914 -0.0748 0.9698 0.0586 -0.0361 0.9794
6 540  0  0  0 0.0023 -0.0064 0.9987  0.0006  0.0072 0.9808 -0.0162 -0.0152 0.9797 0.0369  0.0118 0.9763

这是预期的输出（只是示例）

对于第1部分：

ss (swiched state)   total countable chunks   switching probability
 5                           10                         5/10

对于第2部分：

time     mag
27207    -0.03
26520    -0.98
32034    -0.67
.
.
.
.
etc

好时光

好吧，我认为这已经解决了。我将它们分为两个功能。对于每个函数，您都给一个数据框和一个列名，它将返回请求的数据。

library(dplyr)
thabescity <- function(data, col){
  filter_vec <- data[col] < 0
  new_df <- data %>%
    filter(filter_vec) %>%
    filter(90 <= time & time <= 54000) %>%
    group_by(time) %>%
    summarise()

  ss <- nrow(new_df)
  total <- length(unique(data$time))
  switching_probability <- ss/total
  results <- c(ss, total, switching_probability)
  output <- as.data.frame(cbind(ss, total, switching_probability))
  return(output)
}

print(thabescity(data, "mag"))
   ss total switching_probability
1 298  1201             0.2481266

您可以创建一个列表并循环运行以处理所有列，并将其显示在列表中：

data_names <- names(data)[2:length(names(data))]
first_problem <- list()
for(name in data_names){
  first_problem[[name]] <- thabescity(data, name)
}
first_problem[["mag"]]

   ss total switching_probability
1 298  1201             0.2481266

第二个问题要容易一些：

thabescity2 <- function(data, col){
  data <- data[,c("time", col)]
  filter_vec <- data[col] < 0
  new_df <- data %>%
    filter(filter_vec) %>%
    filter(90 <= time & time <= 54000) %>%
    group_by(time) %>%
    filter(row_number() == 1)

  return(new_df)
}
print(thabescity2(data, "mag"))

Source: local data frame [298 x 2]
Groups: time

       time          mag
1  27207.09 -0.003333333
2  27297.18 -0.006666667
3  27387.27 -0.010000000
4  27477.36 -0.013333333
5  27567.45 -0.016666667
6  27657.54 -0.020000000
7  27747.63 -0.023333333
8  27837.72 -0.026666667
9  27927.81 -0.030000000
10 28017.90 -0.033333333
..      ...          ...

您可以执行与上述相同的操作来遍历整个数据框：

data_names <- names(data)[2:length(names(data))]
second_problem <- list()
for(name in data_names){
  second_problem[[name]] <- thabescity2(data, name)
}
second_problem[["mag"]]

Source: local data frame [298 x 2]
Groups: time

       time          mag
1  27207.09 -0.003333333
2  27297.18 -0.006666667
3  27387.27 -0.010000000
4  27477.36 -0.013333333
5  27567.45 -0.016666667
6  27657.54 -0.020000000
7  27747.63 -0.023333333
8  27837.72 -0.026666667
9  27927.81 -0.030000000
10 28017.90 -0.033333333
..      ...          ...