我目前正在使用 R 自动化一些基本的经验分析。目前,我的脚本设置如下所示,生成如下所示的图。
data <- list()
for (experiment in experiments) {
path = paste('../out/', experiment, '/', plot, '.csv', sep="")
data[[experiment]] <- read.csv(path, header=F)
}
df <- data.frame(Year=1:40,
'current'=colMeans(data[['current']]),
'vip'=colMeans(data[['vip']]),
'vipbonus'=colMeans(data[['vipbonus']]))
df <- melt(df, id.vars = 'Year', variable.name = 'Series')
plotted <- ggplot(df, aes(Year, value)) +
geom_line(aes(colour = Series)) +
labs(y = ylabel, title = title)
file = paste(plot, '.png', sep="")
ggsave(filename = file, plot = plotted)
While this is close to what we want the final product to look like, the series labels need to be updated. Ideally we want them to be something like "VIP, no bonus", "VIP, with bonus" and so forth, but obviously using labels like that in the data frame is not valid R (and invalid characters are automatically replaced with .
even with backticks). Since these experiments are a work in progress, we also know that we are gong to need more series labels in the future so we don't want to lose the ability of ggplot
to automatically set the colors for us.
How can I set the series labels to be appropriate for humans?
OP 解释说,他目前正在致力于自动化一些基本的经验分析,其中一部分是重新标记系列。OP 还显示了一些用于准备要绘制的数据的代码。
根据评论中提供的其他信息,我相信可以简化整个处理过程,这也将解决系列标签问题。
# used for creating file paths
experiments <- c("current", "vip", "vipbonus")
# used for labeling the series
exp_labels <- c("Current", "VIP, no bonus", "VIP, with bonus")
plot <- "dataset1" # e.g.
paths <- paste0(file.path("../out", experiments, plot), ".csv")
paths
#[1] "../out/current/dataset1.csv" "../out/vip/dataset1.csv" "../out/vipbonus/dataset1.csv"
library(data.table) #version 1.10.4 used here
# read all files into one large data.table
# add running count in column "Series" to identify the source of each row
DT <- rbindlist(lapply(paths, fread, header = FALSE), idcol = "Series")
# rename file chunks = Series, use predefined labels
DT[, Series := factor(Series, labels = exp_labels)]
# reshape from wide to long
molten <- melt(DT, id.vars = "Series")
# compute means by Series and Year = variable
aggregated <- molten[, .(value = mean(value)), by = .(Series, variable)]
# take factor level number of "variable" as Year
aggregated[, Year := as.integer(variable)]
请注意,聚合是以长格式(after melt()
)完成的,以节省为每列键入相同的命令。
library(ggplot2)
ggplot(aggregated, aes(Year, value)) +
geom_line(aes(colour = Series)) +
labs(y = "ylabel", title = "title")
file = paste(plot, '.png', sep="")
ggsave(filename = file) # by default, the last plot is saved
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句