Say I have two dataframes like the following:
n = c(2, 3, 5, 5, 6, 7)
s = c("aa", "bb", "cc", "dd", "ee", "ff")
b = c(2, 4, 5, 4, 3, 2)
df = data.frame(n, s, b)
# n s b
#1 2 aa 2
#2 3 bb 4
#3 5 cc 5
#4 5 dd 4
#5 6 ee 3
#6 7 ff 2
n2 = c(5, 6, 7, 6)
s2 = c("aa", "bb", "cc", "ll")
b2 = c("hh", "nn", "ff", "dd")
df2 = data.frame(n2, s2, b2)
# n2 s2 b2
#1 5 aa hh
#2 6 bb nn
#3 7 cc ff
#4 6 ll dd
I want to merge them to achieve the following result:
#n s b n2 s2 b2
#2 aa 2 5 aa hh
#3 bb 4 6 bb nn
#5 cc 5 7 cc ff
#5 dd 4 6 ll dd
Basically, what I want to achieve is to merge the two dataframes whenever the values in s of the first data is found in either the s2 or the b2 columns of data2.
I know that merge can work when I specify the two columns from each dataframe but I am not sure how to ADD the OR condition in the merge function. Or how to achieve this goal using other commands from packages such as dpylr.
Also, to clarify, there will be a situation where s2 and b2 have matches with s column in the same row. If this is the case, then just merge them once.
A coupld of problems: 1) you have built a couple of dataframes with factors which has a tendency to screw up matching and indexing, so I used stringsAsFactors =FALSE in hte dataframe calls. 2) you have an ambiguous situation with no stated resolution when both s2 and b2 have matches in the s column (as does occur in your example):
> df2[c("s")] <- list( c( df$s[pmax( match( df2$s2 , df$s), match(df2$b2, df$s),na.rm=TRUE)]))
> df2
n2 s2 b2 s
1 5 aa hh aa
2 6 bb nn bb
3 7 cc ff ff
4 6 ll dd dd
> df2[c("s")] <- list( c( df$s[pmin( match( df2$s2 , df$s), match(df2$b2, df$s),na.rm=TRUE)]))
> df2
n2 s2 b2 s
1 5 aa hh aa
2 6 bb nn bb
3 7 cc ff cc
4 6 ll dd dd
Once you resolve the ambiguity to your satiusfaction just use the same method to extract and match the "b"s:
> df2[c("b")] <- list( c( df$b[pmin( match( df2$s2 , df$s), match(df2$b2, df$s),na.rm=TRUE)]))
> df2
n2 s2 b2 s b
1 5 aa hh aa 2
2 6 bb nn bb 4
3 7 cc ff cc 5
4 6 ll dd dd 4
Modified df's:
> dput(df)
structure(list(n = c(2, 3, 5, 5, 6, 7), s = c("aa", "bb", "cc",
"dd", "ee", "ff"), b = c(2, 4, 5, 4, 3, 2)), .Names = c("n",
"s", "b"), row.names = c(NA, -6L), class = "data.frame")
> dput(df2)
structure(list(n2 = c(5, 6, 7, 6), s2 = c("aa", "bb", "cc", "ll"
), b2 = c("hh", "nn", "ff", "dd"), s = c("aa", "bb", "cc", "dd"
), b = c(2, 4, 5, 4)), row.names = c(NA, -4L), .Names = c("n2",
"s2", "b2", "s", "b"), class = "data.frame")
One step solution:
> df2[c("s", "c")] <- df[pmin( match( df2$s2 , df$s), match(df2$b2, df$s),na.rm=TRUE), c("s", "b")]
> df2
n2 s2 b2 s c
1 5 aa hh aa 2
2 6 bb nn bb 4
3 7 cc ff cc 5
4 6 ll dd dd 4
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments