假设我们有一个像这样的数据框:
df <- data.frame(x = seq(10, 20), y = seq(8, 18), z = seq(0, 10))
x y z
1 10 8 0
2 11 9 1
3 12 10 2
4 13 11 3
5 14 12 4
6 15 13 5
7 16 14 6
8 17 15 7
9 18 16 8
10 19 17 9
11 20 18 10
我们如何选择在所有X,Y和Z上处于最高百分比的情况?我需要一个代码来搜索所有变量中前1%的案例,然后如果什么都没找到,则将条件放宽到2%,然后是3%,依此类推,直到找到m个案例中所有百分比最高的案例变量。我们需要根据需要设置m。
我认为这应该为您解决问题:
df<-data.frame(x=seq(10,20), y=seq(8,18), z=seq(0,10))
#defining function - df is input frame, cases is the 'm' you are looking for
#startingperc is just the percentage level you want to start with and tickrate
#is the rate at which you decrease the perentile until you get m cases
myfunc <- function(df, cases, startingperc, tickrate){
found <- 0
while(found < cases) {
quants <- apply(df, 2, quantile, probs = startingperc)
indices <- which(apply(df, 1, function(x) all(x > quants)) == TRUE)
found <- length(indices)
if(found < cases) {startingperc <- startingperc - tickrate}
}
#added this to handle a tickrate that is too large
if (length(indices) > cases) {
indices <- rev(indices[order(apply(df[indices,],1, sum), decreasing = T)[1:cases]])
}
return(df[indices,])
}
#in use
myfunc(df, 5, .99, .01)
给予:
> myfunc(df, 5, .99, .01)
x y z
7 16 14 6
8 17 15 7
9 18 16 8
10 19 17 9
11 20 18 10
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句