因此,我有两个不同的数据帧:我一直在研究的数据帧(df1)和我需要放入第一个数据帧的所有新数据(df2)。Df1有几列零,等待添加数据。Df2有我需要的数据,还有一些我不在乎这些数据的行和列。这是我正在处理的数据类型的一小部分。
这是我第一次发布数据,希望我做得对。让我知道您是否需要其他格式。
df1:
structure(list(season = c(" FA15", " FA15", " FA15", " FA15",
" FA15", " FA15", " FA15", " FA15", " FA15", " FA15"), year = c("2015",
"2015", "2015", "2015", "2015", "2015", "2015", "2015", "2015",
"2015"), territory.name = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), plot = c("0",
"0", "0", "0", "0", "0", "0", "0", "0", "0"), color.band = c("APGBY",
"APGGU", "APGPW", "APGPW", "APGR", "APGUO", "APGUO", "APGUO",
"APGUO", "APGYR")), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
df2:
structure(list(bandnum = c(157328052, 160379101, 157328094, 151313455,
170364680, 160379104, 151373458, 157328066, 160379103, 160379105
), project = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L), .Label = c("*ISSJ", "ISSJ"), class = "factor"), color.band = c("PAWR",
"WYWAR", "APGP", "APGO", "ABYG", "URYAR", "APBW", "WABG", "OBWAR",
"GBGAR"), sex = structure(c(3L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 1L,
2L), .Label = c("?", "F", "M"), class = "factor"), age = structure(c(2L,
1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L), .Label = c("AHY", "ASY",
"HY", "N", "SY"), class = "factor")), row.names = c(NA, 10L), class = "data.frame")
我花了几天的时间思考这个问题,尝试不同的方法并在堆栈溢出时读取了很多答案,但是我未能就如何从一个数据帧中获取数据并将其复制到现有数据中给出明确的答案基于第三列的共享ID的另一个数据框中的列。
几乎,我希望r看到color.band列中的两个数据帧都有波段ABCDEF的列表,然后从与ABCDEF相同的行中的df2 $ bandnum中取值,并将其复制到df1 $ bandnum中。 ABCDEF行在那里。
我不想将df2中的行而不是df1复制到df1中。我想在bandnum列中将df1中存在但不df2中存在的条目标记为N / A。
色带和色带号的列名和数据格式已在两个数据框之间进行了标准化,因此所有内容都应对齐。到目前为止,我对代码的了解是:
> practicedf <- left_join(x=df1, y=df2, by = "color.band", all.x = TRUE)
%>% mutate(y = ifelse(is.na(df1$color.band), df1$bandnum, df1$color.band)) %>% select(df2$bandnum)
left_join似乎是正确的,因为它将所有行保留在左侧(df1)数据帧中,并且仅匹配右侧(df2)数据帧中的行。我收到此错误:
Error in `[[<-.data.frame`(`*tmp*`, col, value = c("APGBY", "APGGU", "APGPW", :
replacement has 1261 rows, data has 2559
color.band是一个字符向量,而bandnum是数字,这是一个问题吗?这可能是什么问题?
编辑:我在两个数据帧中都有列bandnum时出错,所以我将df2 $ bandnum更改为bandnum.y。我的代码是现在
df1_test <- left_join(x=df1, y=df2, by = "color.band") %>% mutate(y =
ifelse(is.na(color.band), bandnum, color.band)) %>% select(bandnum.y)
但是当我查看(df1_test)时,它仅显示bandnum.y列,并且条目数与原始df1不同
这是df1_test的子集(不是全部,因为它是2600个条目)
有什么办法可以使它同时显示其余数据吗?
structure(list(bandnum.y = c("171324972", "171324972", "171324972",
"178324697", "178324697", "178324697", "178324697", "178324697",
"178324697", "178324697", "170364505", "170364505", "170364505",
"170364505", "170364505", "170364505", NA, "178324692", "178324692",
"178324692")), row.names = c(NA, -20L), class = c("tbl_df", "tbl",
"data.frame"))
加入后,我们无法使用原始数据集'df1'列,因为它是个left_join
。在中tidyverse
,我们指定未加引号的列名称。中没有all.x
论点left_join
。应该是merge
library(dplyr)
left_join(x=df1, y=df2, by = "color.band") %>%
mutate(y = ifelse(is.na(color.band), bandnum, color.band))
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句