如何在多行 ggplot2 系列中设置系列标签？

debugcn 发表于 Dev

理发师

我目前正在使用 R 自动化一些基本的经验分析。目前，我的脚本设置如下所示，生成如下所示的图。

data <- list()
for (experiment in experiments) {
    path = paste('../out/', experiment, '/', plot, '.csv', sep="")
    data[[experiment]] <- read.csv(path, header=F)
}

df <- data.frame(Year=1:40,
                 'current'=colMeans(data[['current']]),
                 'vip'=colMeans(data[['vip']]),
                 'vipbonus'=colMeans(data[['vipbonus']]))

df <- melt(df, id.vars = 'Year', variable.name = 'Series')
plotted <- ggplot(df, aes(Year, value)) +
           geom_line(aes(colour = Series)) +
           labs(y = ylabel, title = title)

file = paste(plot, '.png', sep="")
ggsave(filename = file, plot = plotted)

While this is close to what we want the final product to look like, the series labels need to be updated. Ideally we want them to be something like "VIP, no bonus", "VIP, with bonus" and so forth, but obviously using labels like that in the data frame is not valid R (and invalid characters are automatically replaced with . even with backticks). Since these experiments are a work in progress, we also know that we are gong to need more series labels in the future so we don't want to lose the ability of ggplot to automatically set the colors for us.

How can I set the series labels to be appropriate for humans?

Uwe

OP 解释说，他目前正在致力于自动化一些基本的经验分析，其中一部分是重新标记系列。OP 还显示了一些用于准备要绘制的数据的代码。

根据评论中提供的其他信息，我相信可以简化整个处理过程，这也将解决系列标签问题。

一些准备

# used for creating file paths
experiments <- c("current", "vip", "vipbonus")
# used for labeling the series
exp_labels <- c("Current", "VIP, no bonus", "VIP, with bonus")
plot <- "dataset1"   # e.g.
paths <- paste0(file.path("../out", experiments, plot), ".csv") 
paths
#[1] "../out/current/dataset1.csv"  "../out/vip/dataset1.csv"      "../out/vipbonus/dataset1.csv"

读取数据

library(data.table)   #version 1.10.4 used here
# read all files into one large data.table
# add running count in column "Series" to identify the source of each row
DT <- rbindlist(lapply(paths, fread, header = FALSE), idcol = "Series")
# rename file chunks = Series, use predefined labels
DT[, Series := factor(Series, labels = exp_labels)]

按组重塑和聚合

# reshape from wide to long
molten <- melt(DT, id.vars = "Series")
# compute means by Series and Year = variable
aggregated <- molten[, .(value = mean(value)), by = .(Series, variable)]
# take factor level number of "variable" as Year
aggregated[, Year := as.integer(variable)]

请注意，聚合是以长格式（after melt()）完成的，以节省为每列键入相同的命令。

创建图表并保存到磁盘

library(ggplot2)
ggplot(aggregated, aes(Year, value)) +
  geom_line(aes(colour = Series)) +
  labs(y = "ylabel", title = "title")

file = paste(plot, '.png', sep="")
ggsave(filename = file)   # by default, the last plot is saved

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。

编辑于2021-07-10

我来说两句

0条评论

登录后参与评论

来自分类Dev

Related 相关文章

文章