我想使用数据框中其他地方的信息创建一个新变量。这看起来很简单,但我想按比例分配新变量的级别。
我有一个数据框:
dd<-read.table(text="
group piece answer
group1 A noise
group1 A silence
group1 A silence
group1 B silence
group1 B loud_noise
group1 B noise
group1 B loud_noise
group1 B noise
group2 C silence
group2 C silence", header=TRUE)
我想创建一个具有两个级别的新变量“majority_annotation”:好和坏。好意味着每件作品都有多数答案同意(> 55%)。坏意味着该片没有多数答案同意。
group piece answer majority_agreement
group1 A noise good
group1 A silence good
group1 A silence good
group1 B silence bad
group1 B loud_noise bad
group1 B noise bad
group1 B loud_noise bad
group1 B noise bad
group2 C silence good
group2 C silence good
我可以以二进制方式执行此操作(全部或不同意):
newdf <- df %>%
group_by(group) %>%
mutate(majority_agreement = ifelse(length(unique(answer)) <= 1,
'good',
ifelse(length(unique(answer) > 1) &
(length(unique(answer)) >= 2), 'bad', 'bad'))) %>%
as.data.frame
我怎么能按比例做呢?
library(dplyr)
newdf <- df %>%
count(group, piece, answer) %>% # How many of each answer for each group & piece
group_by(group, piece) %>%
mutate(share = n / sum(n)) %>% # What share have this answer?
summarize(max_share = max(share)) %>% # What's the largest share among them?
mutate(majority_agreement = if_else(max_share > 0.55, "good", "bad")) %>%
ungroup() %>%
right_join(df) # Add the conclusion back to the original data
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句