Merge DataFrames on two columns

Luis

This is a follow-up from this question

I have two pandas DataFrames, as follows:

print( a )

    foo   bar   let letval
9  foo1  bar1  let1      a
8  foo2  bar2  let1      b
7  foo3  bar3  let1      c
6  foo1  bar1  let2      z
5  foo2  bar2  let2      y
4  foo3  bar3  let2      x

print( b )

    foo   bar   num  numval
0  foo1  bar1  num1       1
1  foo2  bar2  num1       2
2  foo3  bar3  num1       3
3  foo1  bar1  num2       4
4  foo2  bar2  num2       5
5  foo3  bar3  num2       6

I want to merge the two of them on the columns [ 'foo', 'bar' ].

If I simply do c = pd.merge( a, b, on=['foo', 'bar'] ), I get:

prnint( c )

     foo   bar   let letval   num  numval
0   foo1  bar1  let1      a  num1       1
1   foo1  bar1  let1      a  num2       4
2   foo1  bar1  let2      z  num1       1
3   foo1  bar1  let2      z  num2       4
4   foo2  bar2  let1      b  num1       2
5   foo2  bar2  let1      b  num2       5
6   foo2  bar2  let2      y  num1       2
7   foo2  bar2  let2      y  num2       5
8   foo3  bar3  let1      c  num1       3
9   foo3  bar3  let1      c  num2       6
10  foo3  bar3  let2      x  num1       3
11  foo3  bar3  let2      x  num2       6

I would like:

print( c )

    foo   bar   let letval   num   numval
0  foo1  bar1  let1      a   num1       1
1  foo2  bar2  let1      b   num1       2
2  foo3  bar3  let1      c   num1       3
3  foo1  bar1  let2      z   num2       4
4  foo2  bar2  let2      y   num2       5
5  foo3  bar3  let2      x   num2       6

The closest I've got is:

c = pd.merge( a, b, left_index=['foo', 'bar'], right_index=['foo', 'bar'] )

What am I missing?

And why do I get c.shape = (12,6) in the first example?


Edit

Thanks to @piRSquared's answer I realized that the underlying problem is that there is not a single combination of columns to do that. Thus the merge problem, as posed before cannot be univocally solved. That said, the question is converted into a simpler one:

How to make a univocal relationship between the tables?

I solved that with a dictionary that maps the desired outputs that need to be aligned:

map_ab = { 'num1':'let1', 'num2':'let2' }
b['let'] = b.apply( lambda x: map_ab[x['num']], axis=1 )
c = pd.merge( a, b, on=['foo', 'bar', 'let'] )
print( c )
piRSquared

The reason you are getting that is because the columns you are merging on do not constitute unique combinations. For example, The first (index 0) row of a has foo1 and bar1, but so does the fourth row (index 3). Ok, that's fine, but b has the same issue. So, when you match up b's foo1 & bar1 for row indexed with 0 it matches twice. Same is true when you match foo1 & bar1 in row indexed with 3, it matches twice. So you end up with four matches for those 2 rows.

So you get

  • a row 0 matches with b row 0
  • a row 0 matches with b row 3
  • a row 3 matches with b row 0
  • a row 3 matches with b row 3

And THEN, your example does this 2 more times. 3 * 4 == 12

The only way to do this and be unambiguous is to decide on a rule on which match to take if there are more than one matches. I decided to groupby one of your other columns then take the first one. It still doesn't match your expected output but I'm proposing that you gave a bad example.

pd.merge( a, b, on=['foo', 'bar']).groupby(['foo', 'bar', 'let'], as_index=False).first()

enter image description here

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Merge DataFrames on two columns

From Dev

Pandas merge two dataframes with different columns

From Dev

Merge two DataFrames with some equal columns

From Dev

Merge two DataFrames with overlapping MultiIndex columns

From Dev

how to merge two dataframes and sum the values of columns

From Dev

Merge two DataFrames with overlapping MultiIndex columns

From Dev

Merge two similar dataframes that have the same columns

From Dev

Problem while trying to merge two columns of two different dataframes?

From Dev

how to merge two unequal dataframes based on multiple columns?

From Dev

How to merge two dataframes using multiple columns as key?

From Java

How to merge two dataframes and keep only different columns (content)?

From Dev

merge and sum two dataframes where columns match python pandas

From Dev

How to merge 2 dataframes based on two columns via string containment

From Dev

Merge three dataframes by two columns, keeping only the largest value in R

From Dev

How to merge pandas dataframes on datetime index and two columns

From Java

Merge two dataframes by index

From Dev

merge two dataframes pandas

From Dev

R Merge two Dataframes

From Dev

merge two dataframes R

From Dev

Complex merge of two DataFrames

From Dev

Merge two dataframes with pandas

From Dev

Merge two dataframes with python

From Dev

Merge two dataframes by index

From Dev

Complex merge of two DataFrames

From Dev

Merge two dataframes with pandas

From Dev

How to merge these two DataFrames

From Dev

Compare and merge two dataframes

From Dev

Merge two dataframes in pandas

From Dev

Merge dataframes with the same number of columns

Related Related

HotTag

Archive