I have the following:
Dataframe 1 (Multi-index dataframe):
| Assay_A |
---------------------------------------------------
Index_A | Index_B | Index_C | mean | std | count |
---------------------------------------------------
128 12345 AAA 123 2 4
Dataframe 2:
Index | Col_A | Col_B | Col_C | mean
-------------------------------------
1 128 12345 AAA 456
where Col_X = Index_X for a,b,c.
I have been spending all morning trying to do the following:
How do I pick the correct mean in dataframe 2 (which has to match up on Col ABC) so I can do mathematical operations on it. For example, I want to take the mean of dataframe 1 and divide it by the correctly chosen mean of dataframe 2.
Ideally, I want to store the results of the operation in a new column. So the final output should look like this:
| Assay_A |
------------------------------------------------------------
Index_A | Index_B | Index_C | mean | std | count | result |
------------------------------------------------------------
128 12345 AAA 123 2 4 0.26
Perhaps there is an easier way to do this I would be open to any such suggestions as well.
what I suggest you do is 1) rename the columns of Dataframe 2 to the respective names of the index columns of Dataframe 1, 2) reset the index on Dataframe 1, and 3) merge the two tables based on the now matching column names. Afterwards you can compute whatever you like. The MultiIndex on the columns of Dataframe 2 adds a bit of additional overhead.
Explicitly:
import pandas as pd
# re-create table1
row_index = pd.MultiIndex.from_tuples([(128, 12345, 'AAA')])
row_index.names=['Index_A', 'Index_B', 'Index_C']
table1 = pd.DataFrame(data={'mean': 123, 'std': 2, 'count': 4}, index=row_index)
table1.columns = pd.MultiIndex.from_tuples(zip(['Assay A'] * 3, table1.columns))
print "*** table 1:"
print table1
print ""
# re-create table2
table2 = pd.DataFrame([{'Col_A': 128, 'Col_B': 12345, 'Col_C': 'AAA', 'mean': 456}], index=[1])
table2.index.name = 'Index'
print "*** table 2:"
print table2
print ""
# re-name columns of table2 to match names of respective index columns in table1
table2 = table2.rename(columns={'Col_A': 'Index_A', 'Col_B': 'Index_B', 'Col_C': 'Index_C'})
# Drop 'Assay A' index level on columns of table1;
# without doing that, the following reset_index() will produce a column multi-index
# for Index_A/B/C, so column names will not match the simple column index of table2_renamed.
# If you need to keep the 'Assay A' level here, you will need to also construct a column
# multi-index for table2_renamed (with empty values for the second level).
table1.columns = table1.columns.levels[1]
# Move index columns of table1 back to regular columns
table1 = table1.reset_index()
# Merge the two tables on the now common column names. 'mean' appears in both tables,
# give the column from table2 a suffix '_2'.
joint = pd.merge(table1.reset_index(), table2, on=['Index_A', 'Index_B', 'Index_C'], suffixes={'', '_2'})
print "*** joint, before re-setting index:"
print joint
print ""
# Restore index of the joint table
joint = joint.set_index(['Index_A', 'Index_B', 'Index_C'])
# Compute the 'result'
joint['result'] = joint['mean'] / joint['mean_2']
# drop unused columns
joint = joint.drop(['index', 'mean_2'], axis=1)
# restore column index level
joint.columns = pd.MultiIndex.from_tuples(zip(['Assay A'] * 4, joint.columns))
print "*** final result:"
print joint
print ""
The script output is:
*** table 1:
Assay A
count mean std
Index_A Index_B Index_C
128 12345 AAA 4 123 2
*** table 2:
Col_A Col_B Col_C mean
Index
1 128 12345 AAA 456
*** joint, before re-setting index:
index Index_A Index_B Index_C count mean std mean_2
0 0 128 12345 AAA 4 123 2 456
*** final result:
Assay A
count mean std result
Index_A Index_B Index_C
128 12345 AAA 4 123 2 0.269737
Hope that helps!
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments