Say I have two matrices, an original and a reference:
import pandas as pa
print "Original Data Frame"
# Create a dataframe
oldcols = {'col1':['a','a','b','b'], 'col2':['c','d','c','d'], 'col3':[1,2,3,4]}
a = pa.DataFrame(oldcols)
print "Original Table:"
print a
print "Reference Table:"
b = pa.DataFrame({'col1':['x','x'], 'col2':['c','d'], 'col3':[10,20]})
print b
Where the tables look like this:
Original Data Frame
Original Table:
col1 col2 col3
0 a c 1
1 a d 2
2 b c 3
3 b d 4
Reference Table:
col1 col2 col3
0 x c 10
1 x d 20
Now I want to subtract from the third column (col3) of the original table (a), the value in the reference table (c) in the row where the second columns of the two tables match. So the first row of table two should have the value 10 added to the third column, because the row of table b where the column is col2 is 'c' has a value of 10 in col3. Make sense? Here's some code that does that:
col3 = []
for ix, row in a.iterrows():
col3 += [row[2] + b[b['col2'] == row[1]]['col3']]
a['col3'] = col3
print "Output Table:"
print a
Yielding the following output:
Output Table:
col1 col2 col3
0 a c [11]
1 a d [22]
2 b c [13]
3 b d [24]
My question is, is there a more elegant way to do this? Also, the results in 'col3' should not be lists. Solutions using numpy are also welcome.
I did not quite understand your description of what you are trying to do, but the output you have shown can be generated by first merging the two data frames and then some simple operations;
>>> df = a.merge(b.filter(['col2', 'col3']), how='left',
left_on='col2', right_on='col2', suffixes=('', '_'))
>>> df
col1 col2 col3 col3_
0 a c 1 10
1 b c 3 10
2 a d 2 20
3 b d 4 20
[4 rows x 4 columns]
>>> df.col3_.fillna(0, inplace=True) # in case there are no matches
>>> df.col3 += df.col3_
>>> df
col1 col2 col3 col3_
0 a c 11 10
1 b c 13 10
2 a d 22 20
3 b d 24 20
[4 rows x 4 columns]
>>> df.drop('col3_', axis=1, inplace=True)
>>> df
col1 col2 col3
0 a c 11
1 b c 13
2 a d 22
3 b d 24
[4 rows x 3 columns]
If values in col2
in b
are not unique, then probably you also need something like:
>>> b.groupby('col2', as_index=False)['col3'].aggregate(sum)
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments