我有一个数据帧df,其数据看起来像
dataset <- data.frame(customer_id = c(12,12,234,234,781,456),
Sales_id = c(20013211129, 20013217122, 20013149844, 20013273151, 20013222724, 20013171637),
Rev = c(1000,1000,1000,1000,1000,1000),
Source = c('App', 'Non-App', 'App', 'Non-App', 'Non-App', 'Non-App'))
customer_id | Sales_id | Rev | Source
12 20013211129 1000 App
12 20013217122 1000 Non-App
234 20013149844 1000 App
234 20013273151 1000 Non-App
781 20013222724 1000 Non-App
456 20013171637 1000 Non-App
我希望使用此数据框的数据创建一个表,结果显示为
........No_of_customers | no_of_orders | total_revenue
App 2 2 2000
Non-App 4 4 4000
Total 6 6 6000
App% 33% 33% 33%
在这里,客户数量基于不重复的customer_id计数,订单数基于不重复的sales_id计数,总数直接是前两行的总和。我是R的新手,所以我需要一些有关该工作应使用哪些功能的帮助
这里是一个解决方案,可能不是最好的解决方案,但是它可以工作...
## Load Data
customer_id <- c(12,12,34,234,781,456)
Sales_id <- c(20013211129,
20013217122,
20013149844,
20013273151,
20013222724,
20013171637)
Rev <- rep(1000, 6)
Source <- c("App", "Non-App", "App", "Non-App", "Non-App", "Non-App")
data <- data.frame(customer_id, Sales_id, Rev, Source, stringsAsFactors = FALSE)
## Create Overview table
library(dplyr)
result <- data %>%
group_by(Source) %>%
summarise(No_of_customers = length(unique(customer_id)),
no_of_orders = length(unique(Sales_id)),
total_revenue = sum(Rev))
temp_res <- result[,-1]
temp_res <- rbind(temp_res, apply(temp_res, 2, sum))
temp_res <- rbind(temp_res, temp_res[1,]/temp_res[3,]*100)
cbind(Cat = c("App", "Non-App", "Total", "App%"), temp_res)
但是,我不建议将总计和份额作为另外一行放入data.frame中。相反,我会做类似的事情...
library(tidyr)
result <- result %>%
pivot_longer(cols = -Source, names_to = "Cat")
## get Total
result %>%
group_by(Cat) %>%
summarise(Sum = sum(value))
## get Share
result %>%
group_by(Cat) %>%
summarise(App_share = value[Source == "App"] / sum(value))
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句