我发布此问题以改进当前方法。感谢您的帮助!
我有两个数据框。
数据A是我的测量结果:
地点1地点2地点3地点4地点5地点6地点7 1/1/2020 76 44 51 1 18 42 69 1/2/2020 80 55 52 30 17 38 12 1/3/2020 36 60 45 44 23 86 4 1/4/2020 6 73 87 15 96 56 22 1/5/2020 100 71 58 69 42 11 69 1/6/2020 6 92 48 73 31 45 89 1/7/2020 46 52 43 90 2 20 8 1 / 8/2020 83 32 23 12 80 64 79 1/9/2020 63 25 74 79 17 29 88 1/10/2020 91 53 41 11 29 48 67 1/11/2020 82 3 32 56 56 61 35 1/12/2020 55 66 69 88 75 78 88 1/13/2020 75 52 74 78 30 17 41 1/14/2020 43 72 24 85 10 75 32
数据B是我的范围数据。范围(最小值,最大值)不是根据数据A计算得出的。
最小最大 1/1/2020 6 60 1/2/2020 10 70 1/3/2020 5 90 1/4/2020 4100 1/5/2020 10100 1/6/2020 3 88 1/7/2020 8 99 1/8/2020 8 101 1/9/2020 7 83 1/10/2020 4 97 1/11/2020 5 89 1/12/2020 9 96 1/13/2020 11 85 1/14/2020 5 103
我要截断的数据是指数据B。将超出范围的值替换为数据B中的最小值和最大值。
这就是我尝试过的。
for (i in 1: 14){
for (j in 1:7){
if (A[i, j]< B[i,1]) {
A[i,j]<-B[i,1]
}
else if (A[i, j]> B[i,2]) {
A[i,j]<-B[i,2]
}
}
}
14是A中的行号。7是A中的列号。A和B的行号相同。
我有大量数据。谁能告诉我更快的方法?
感谢您的时间!
您可以基于日期将两个数据集结合起来并使用pmin
,pmax
并将数据保持在每个日期范围内。
library(dplyr)
A %>%
rownames_to_column('Date') %>%
inner_join(B %>% rownames_to_column('Date'), by = 'Date') %>%
mutate(across(site1:site7, ~pmin(pmax(., min), max)))
# Date site1 site2 site3 site4 site5 site6 site7 min max
#1 1/1/2020 60 44 51 6 18 42 60 6 60
#2 1/2/2020 70 55 52 30 17 38 12 10 70
#3 1/3/2020 36 60 45 44 23 86 5 5 90
#4 1/4/2020 6 73 87 15 96 56 22 4 100
#5 1/5/2020 100 71 58 69 42 11 69 10 100
#6 1/6/2020 6 88 48 73 31 45 88 3 88
#7 1/7/2020 46 52 43 90 8 20 8 8 99
#8 1/8/2020 83 32 23 12 80 64 79 8 101
#9 1/9/2020 63 25 74 79 17 29 83 7 83
#10 1/10/2020 91 53 41 11 29 48 67 4 97
#11 1/11/2020 82 5 32 56 56 61 35 5 89
#12 1/12/2020 55 66 69 88 75 78 88 9 96
#13 1/13/2020 75 52 74 78 30 17 41 11 85
#14 1/14/2020 43 72 24 85 10 75 32 5 103
数据
A <- structure(list(site1 = c(76L, 80L, 36L, 6L, 100L, 6L, 46L, 83L,
63L, 91L, 82L, 55L, 75L, 43L), site2 = c(44L, 55L, 60L, 73L,
71L, 92L, 52L, 32L, 25L, 53L, 3L, 66L, 52L, 72L), site3 = c(51L,
52L, 45L, 87L, 58L, 48L, 43L, 23L, 74L, 41L, 32L, 69L, 74L, 24L
), site4 = c(1L, 30L, 44L, 15L, 69L, 73L, 90L, 12L, 79L, 11L,
56L, 88L, 78L, 85L), site5 = c(18L, 17L, 23L, 96L, 42L, 31L,
2L, 80L, 17L, 29L, 56L, 75L, 30L, 10L), site6 = c(42L, 38L, 86L,
56L, 11L, 45L, 20L, 64L, 29L, 48L, 61L, 78L, 17L, 75L), site7 = c(69L,
12L, 4L, 22L, 69L, 89L, 8L, 79L, 88L, 67L, 35L, 88L, 41L, 32L
)), class = "data.frame", row.names = c("1/1/2020", "1/2/2020",
"1/3/2020", "1/4/2020", "1/5/2020", "1/6/2020", "1/7/2020", "1/8/2020",
"1/9/2020", "1/10/2020", "1/11/2020", "1/12/2020", "1/13/2020",
"1/14/2020"))
B <- structure(list(min = c(6L, 10L, 5L, 4L, 10L, 3L, 8L, 8L, 7L,
4L, 5L, 9L, 11L, 5L), max = c(60L, 70L, 90L, 100L, 100L, 88L,
99L, 101L, 83L, 97L, 89L, 96L, 85L, 103L)), class = "data.frame",
row.names = c("1/1/2020", "1/2/2020", "1/3/2020", "1/4/2020", "1/5/2020",
"1/6/2020", "1/7/2020", "1/8/2020", "1/9/2020", "1/10/2020", "1/11/2020",
"1/12/2020", "1/13/2020", "1/14/2020"))
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句