How do I apply a value from a dataframe based on the value of a multi-index of another dataframe?

Thornhale Published at Dev

Thornhale

I have the following:

Dataframe 1 (Multi-index dataframe):

                             |      Assay_A       |
---------------------------------------------------
Index_A | Index_B | Index_C  | mean | std | count |
---------------------------------------------------
   128     12345      AAA      123    2     4

Dataframe 2:

    Index | Col_A | Col_B | Col_C | mean
    -------------------------------------
      1      128    12345   AAA     456

where Col_X = Index_X for a,b,c.

I have been spending all morning trying to do the following:

How do I pick the correct mean in dataframe 2 (which has to match up on Col ABC) so I can do mathematical operations on it. For example, I want to take the mean of dataframe 1 and divide it by the correctly chosen mean of dataframe 2.

Ideally, I want to store the results of the operation in a new column. So the final output should look like this:

                             |            Assay_A          |
------------------------------------------------------------
Index_A | Index_B | Index_C  | mean | std | count | result |
------------------------------------------------------------
   128     12345      AAA      123    2     4       0.26

Perhaps there is an easier way to do this I would be open to any such suggestions as well.

ehhh

what I suggest you do is 1) rename the columns of Dataframe 2 to the respective names of the index columns of Dataframe 1, 2) reset the index on Dataframe 1, and 3) merge the two tables based on the now matching column names. Afterwards you can compute whatever you like. The MultiIndex on the columns of Dataframe 2 adds a bit of additional overhead.

Explicitly:

import pandas as pd

# re-create table1
row_index = pd.MultiIndex.from_tuples([(128, 12345, 'AAA')])
row_index.names=['Index_A', 'Index_B', 'Index_C']
table1 = pd.DataFrame(data={'mean': 123, 'std': 2, 'count': 4}, index=row_index)
table1.columns = pd.MultiIndex.from_tuples(zip(['Assay A'] * 3, table1.columns))
print "*** table 1:"
print table1
print ""

# re-create table2
table2 = pd.DataFrame([{'Col_A': 128, 'Col_B': 12345, 'Col_C': 'AAA', 'mean': 456}], index=[1])
table2.index.name = 'Index'
print "*** table 2:"
print table2
print ""

# re-name columns of table2 to match names of respective index columns in table1
table2 = table2.rename(columns={'Col_A': 'Index_A', 'Col_B': 'Index_B', 'Col_C': 'Index_C'})

# Drop 'Assay A' index level on columns of table1;
# without doing that, the following reset_index() will produce a column multi-index
# for Index_A/B/C, so column names will not match the simple column index of table2_renamed.
# If you need to keep the 'Assay A' level here, you will need to also construct a column
# multi-index for table2_renamed (with empty values for the second level).
table1.columns = table1.columns.levels[1]

# Move index columns of table1 back to regular columns
table1 = table1.reset_index()

# Merge the two tables on the now common column names. 'mean' appears in both tables,
# give the column from table2 a suffix '_2'.
joint = pd.merge(table1.reset_index(), table2, on=['Index_A', 'Index_B', 'Index_C'], suffixes={'', '_2'})
print "*** joint, before re-setting index:"
print joint
print ""

# Restore index of the joint table
joint = joint.set_index(['Index_A', 'Index_B', 'Index_C'])

# Compute the 'result'
joint['result'] = joint['mean'] / joint['mean_2']

# drop unused columns
joint = joint.drop(['index', 'mean_2'], axis=1)

# restore column index level
joint.columns = pd.MultiIndex.from_tuples(zip(['Assay A'] * 4, joint.columns))

print "*** final result:"
print joint
print ""

The script output is:

*** table 1:
                        Assay A         
                          count mean std
Index_A Index_B Index_C                 
128     12345   AAA           4  123   2

*** table 2:
       Col_A  Col_B Col_C  mean
Index                          
1        128  12345   AAA   456

*** joint, before re-setting index:
   index  Index_A  Index_B Index_C  count  mean  std  mean_2
0      0      128    12345     AAA      4   123    2     456

*** final result:
                        Assay A                   
                          count mean std    result
Index_A Index_B Index_C                           
128     12345   AAA           4  123   2  0.269737

Hope that helps!

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-02-25

Comments

0 comments

From Dev

Related Related

Article

How do I apply a value from a dataframe based on the value of a multi-index of another dataframe?

How do I apply a value from a dataframe based on the value of a multi-index of another dataframe?

How do I assign value to a row in a dataframe based on another in r?

How do i lookup a row on one dataframe based on the column cell value and append that to a row on another dataframe?

Setting value on one dataframe based on the id and value from another dataframe

How to get value from python dataframe based on column index?

How can I get a value from other dataframe's column based on other index?

How do i change a single index value in pandas dataframe?

How do I filter a pandas DataFrame based on value counts?

How can I retrieve a row by index value from a Pandas DataFrame?

How do I filter DataFrame rows based on a key value pair from a list of dictionaries column field?

How can I add a dataframe converted to an array as an element of another dataframe based on a key value

With PySpark dataframe locate value from one array based on index and copy to another array

Pandas Dataframe apply() method provides a row object, but how do you access the index value

How do I substract a value from a column in a dataframe, with unique rows

how to drop rows where the index succeed another index value in a dataframe?

How do I multiply all elements of a column of a dataframe with value specified for that column in another dataframe?

How to add new columns to dataframe with value taken from another dataframe?

Copy value from matching index in another dataframe after criteria matched

If one dataframe value exists in another dataframe, then get a value from the dataframe

Change the value from another dataframe

How do you update a column's value in a dataframe based off of another row?

Python Filling dataframe values based on Column Index present in another dataframe value

How do I get a table value based on the value of another string?

Replace Pandas Dataframe Value Based on Index Range

Python Pandas: Find index based on value in DataFrame

How do I merge two columns in a dataframe based on a datetime index?

Conditional Sums based on another DataFrame column value

How do I filter rows based on whether a column value is in a Set of Strings in a Spark DataFrame

How can I create an "ID" column in one dataframe based on the index of another dataframe?

updating column in one dataframe with value from another dataframe based on matching values