我有以下数据框(79000 行):
ID P1 P2 P3 P4 P5 P6 P7 P8
1 38005 28002 38005 38005 28002 34002 NA NA
2 28002 28002 28002 38005 28002 NA NA NA
我想计算每个数字(代码)在一行数据框中出现的次数。所以输出是这样的:
38005 appears 3 28002 appears 2 34002 appears 1 NA appears 2
28002 appears 3 38005 appears 1 28002 appears 1 NA appears 3
到目前为止,我试图找到最频繁的数字(代码):
df$frequency <-apply(df,1,function(x) names(which.max(table(x))))
但我不知道如何计算每个数字(代码)连续出现的次数。
使用tidyverse
并reshape2
可以这样做:
df %>%
gather(var, val, -ID) %>% #Transforming the data from wide to long format
group_by(val, ID) %>% #Grouping
summarise(count = n()) %>% #Performing the count
dcast(ID~val, value.var = "count") #Reshaping the data
ID 28002 34002 38005 NA
1 1 2 1 3 2
2 2 4 NA 1 3
根据 ID 显示具有最大计数的前两个非 NA 列:
df %>%
gather(var, val, -ID) %>% #Transforming the data from wide to long format
group_by(val, ID) %>% #Grouping
mutate(temp = n()) %>% #Performing the count
group_by(ID) %>% #Grouping
mutate(temp2 = dense_rank(temp)) %>% #Creating the rank based on count
group_by(ID, val) %>% #Grouping
summarise(temp3 = first(temp2), #Summarising
temp = first(temp)) %>%
arrange(ID, desc(temp3)) %>% #Arranging
na.omit() %>% #Deleting the rows with NA
group_by(ID) %>%
mutate(temp4 = ifelse(temp3 == first(temp3) | temp3 == nth(temp3, 2), 1, 0)) %>% #Identifying the highest and the second highest count
filter(temp4 == 1) %>% #Selecting the highest and the second highest count
dcast(ID~val, value.var = "temp") #Reshaping the data
ID 28002 38005
1 1 2 3
2 2 4 1
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句