我有一个数据集,其中的值是折叠的,因此每一行每一列有多个输入。
例如:
Gene Score1
Gene1 NA, NA, NA, 0.03, -0.3
Gene2 NA, 0.2, 0.1
我正在尝试将其解压缩,然后为该Score1
列选择每行的最大绝对值-并通过创建新列来跟踪最大绝对值以前是否为负值。
因此,示例的输出为:
Gene Score1 Negatives1
Gene1 0.3 1
Gene1 0.2 0
#Score1 is now the maximum absolute value and if it used to be negative is tracked
我用以下代码编写:
dat2 <- dat %>%
tidyr::separate_rows(Score1, sep = ",", convert = TRUE) %>%
group_by(Gene) %>%
#Create negative column to track max absolute values that were negative
summarise(Negatives1 = +(min(Score1 == -max(abs(Score1))),
Score1 = max(abs(Score1), na.rm = TRUE))
但是,由于某种原因,以上代码给了我这个错误:
Error: Problem with `summarise()` input `Negatives1`.
x non-numeric argument to mathematical function
i Input `Negatives1` is `+(min(Score1) == -max(abs(Score1)))`.
i The error occurred in group 1: Gene = "Gene1".
Run `rlang::last_error()` to see where the error occurred.
我虽然通过使用convert = TRUE
它将使值成为数字-但是错误提示我运行后代码获取非数字值separate_rows()
?
输入数据示例:
structure(list(Gene = c("Gene1", "Gene2"), Score1 = c("NA, NA, NA, 0.03, -0.3",
"NA, 0.2, 0.1")), row.names = c(NA, -2L), class = c("data.table",
"data.frame"))
如果我们看一下separate_rows
结果,我认为问题就很清楚了:您分隔的列不是数字!我想convert
没有捡起来。我们可以使用强制转换as.numeric()
(并忽略警告-我们希望" NA"
成为NA
)。
您也有一些问题summarise
-需要更多na.rm = TRUE
,配对错误等。
dat %>%
tidyr::separate_rows(Score1, sep = ",", convert = TRUE)
# # A tibble: 8 x 2
# Gene Score1
# <chr> <chr>
# 1 Gene1 NA
# 2 Gene1 " NA"
# 3 Gene1 " NA"
# 4 Gene1 " 0.03"
# 5 Gene1 " -0.3"
# 6 Gene2 NA
# 7 Gene2 " 0.2"
# 8 Gene2 " 0.1"
dat %>%
tidyr::separate_rows(Score1, sep = ",", convert = TRUE) %>%
mutate(Score1 = as.numeric(Score1)) %>%
group_by(Gene) %>%
#Create negative column to track max absolute values that were negative
summarise(
Negatives1 = +(min(Score1, na.rm = TRUE) == -max(abs(Score1), na.rm = TRUE)),
Score1 = max(abs(Score1), na.rm = TRUE)
)
# `summarise()` ungrouping output (override with `.groups` argument)
# # A tibble: 2 x 3
# Gene Negatives1 Score1
# <chr> <int> <dbl>
# 1 Gene1 1 0.3
# 2 Gene2 0 0.2
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句