计算每个值出现在一行数据帧 r 中的次数

debugcn 发表于 Dev

用户10672282

我有以下数据框（79000 行）：

ID       P1      P2      P3      P4        P5        P6      P7     P8  
1       38005   28002   38005   38005    28002    34002      NA     NA
2       28002   28002   28002   38005    28002    NA         NA     NA

我想计算每个数字（代码）在一行数据框中出现的次数。所以输出是这样的：

38005 appears 3   28002 appears 2    34002 appears 1     NA appears 2 
28002 appears 3   38005 appears 1    28002 appears 1     NA appears 3

到目前为止，我试图找到最频繁的数字（代码）：

df$frequency <-apply(df,1,function(x) names(which.max(table(x))))

但我不知道如何计算每个数字（代码）连续出现的次数。

tmfmk

使用tidyverse并reshape2可以这样做：

df %>%
 gather(var, val, -ID) %>% #Transforming the data from wide to long format
 group_by(val, ID) %>% #Grouping 
 summarise(count = n()) %>% #Performing the count
 dcast(ID~val, value.var = "count") #Reshaping the data

  ID 28002 34002 38005 NA
1  1     2     1     3  2
2  2     4    NA     1  3

根据 ID 显示具有最大计数的前两个非 NA 列：

df %>%
 gather(var, val, -ID) %>% #Transforming the data from wide to long format
 group_by(val, ID) %>% #Grouping
 mutate(temp = n()) %>% #Performing the count
 group_by(ID) %>% #Grouping
 mutate(temp2 = dense_rank(temp)) %>% #Creating the rank based on count
 group_by(ID, val) %>% #Grouping
 summarise(temp3 = first(temp2), #Summarising 
           temp = first(temp)) %>%
 arrange(ID, desc(temp3)) %>% #Arranging
 na.omit() %>% #Deleting the rows with NA
 group_by(ID) %>%
 mutate(temp4 = ifelse(temp3 == first(temp3) | temp3 == nth(temp3, 2), 1, 0)) %>% #Identifying the highest and the second highest count
 filter(temp4 == 1) %>% #Selecting the highest and the second highest count
 dcast(ID~val, value.var = "temp") #Reshaping the data

  ID 28002 38005
1  1     2     3
2  2     4     1

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。