How to update a pandas dataframe with sets, from another dataframe

Mick

Trying to update each row from df1 to df2 if unique_value is matched, then based on status from df1, update price in price_array in df2; If not, append the row to df2 and assign new ID column.

This is a part 2 question from: Iterate each row by updating values from 1st dataframe to 2nd dataframe based on unique value w/ different index, otherwise append and assign new ID

Note:
active and new: add
suspended and inactive: remove

df1 (NO ID COLUMN):

        unique_value        Status        Price
0       xyz123              active        6.67
1       eff987              suspended     1.75
2       efg125              active        5.77
3       xyz123              new           7.55
4       xyz123              inactive      4.55
5       eff987              new           5.55

df2:

        unique_value        Price_array  ID
0       xyz123              {4.55}       1000
1       xyz985              {1.31}       1001
2       abc987              {4.56}       1002
3       eff987              {1.75}       1003
4       asd541              {8.85}       1004

Desired output for updated df2:

        unique_value        Price_array       ID
0       xyz123              {6.67,7.55}       1000    <- updated (added 6.67, added 7.55, removed 4.55)
1       xyz985              {1.31}            1001    
2       abc987              {4.56}            1002
3       eff987              {5.55}            1003    <- updated (removed 1.75, added 5.55)
4       asd541              {8.85}            1004
5       efg125              {5.77}            1005    <- appended and new ID assigned

Here is the code from part 1: Iterate each row by updating values from 1st dataframe to 2nd dataframe based on unique value w/ different index, otherwise append and assign new ID

# additional state variables
# 1. for the ID to be added
current_max_id = df2["ID"].max()
# 2. for matching unique_values, avoiding searching df2["unique_value"] every time
current_value_set = set(df2["unique_value"].values)

# match unique_value's using the state variable instead of `df2`
mask = df1["unique_value"].isin(current_value_set)

for i in range(len(df1)):
    
    # current unique_value from df1
    uv1 = df1["unique_value"][i]
    
    # 1. update existing
    if mask[i]:
        
        # broadcast df1 into the matched rows in df2 (mind the shape)
        df2.loc[df2["unique_value"] == uv1, ["unique_value", "Status", "Price"]] = df1.iloc[i, :].values.reshape((1, 3))
        
        #UPDATE PRICE with PRICE_ARRAY
        ...see below

    # 2. append new
    else:
        # update state variables
        current_max_id += 1
        current_value_set.add(uv1)
        # append the row (assumes df2.index=[0,1,2,3,...])
        df2.loc[len(df2), :] = [df1.iloc[i, 0], df1.iloc[i, 1], df1.iloc[i, 2], current_max_id]

Is there any way to update the price in df1 to price_array in df2 based on status from df1? I'm thinking something along the line of this ("status" column removed from the broadcast portion of the code):

        curr_price=df1.iloc[i,df1.columns.get_loc('Price')]
        if df1.iloc[i,df1.columns.get_loc('Status')] in ('inactive', 'suspended'):
            df2.loc[df2["unique_value"] == uv1,'Price_array'].discard(curr_price)
        else:
            df2.loc[df2["unique_value"] == uv1,'Price_array'].add(curr_price)  

But got the following error:

ValueError                                Traceback (most recent call last)
<ipython-input-156-6ff78c7a4a9a> in <module>()
     46     if mask[i]:
     47         # Broadcast refresh table into the matched rows in historical
---> 48         df2.loc[df2["unique_value"] == uv1, ["unique_value", "Price"]] = df1.iloc[i, :].values.reshape((1,3))
     49 

/anaconda/envs/pyfull36/lib/python3.6/site-packages/pandas/core/indexing.py in __setitem__(self, key, value)
    192             key = com._apply_if_callable(key, self.obj)
    193         indexer = self._get_setitem_indexer(key)
--> 194         self._setitem_with_indexer(indexer, value)
    195 
    196     def _has_valid_type(self, k, axis):

/anaconda/envs/pyfull36/lib/python3.6/site-packages/pandas/core/indexing.py in _setitem_with_indexer(self, indexer, value)
    581                     value = np.array(value, dtype=object)
    582                     if len(labels) != value.shape[1]:
--> 583                         raise ValueError('Must have equal len keys and value '
    584                                          'when setting with an ndarray')
    585 

ValueError: Must have equal len keys and value when setting with an ndarray
Trenton McKinney
  • The following code as 3 main steps:
    1. Setup the dataframes, and .join them.
    2. Use np.where and set math, to update 'Price_array'.
    3. Fill any missing ID values, using .update.

Setup DataFrames

  • This is how they should look to begin with
import pandas as pd

# setup dataframes
df1 = pd.DataFrame({'unique_value': ['xyz123', 'eff987', 'efg125', 'xyz123', 'xyz123', 'eff987'], 'Status': ['active', 'suspended', 'active', 'new', 'inactive', 'new'], 'Price': [6.67, 1.75, 5.77, 7.55, 4.55, 5.55]})
df2 = pd.DataFrame({'unique_value': ['xyz123', 'xyz985', 'abc987', 'eff987', 'asd541'], 'Price_array': [{4.55}, {1.31}, {4.56}, {1.75}, {8.85}], 'ID': [1000, 1001, 1002, 1003, 1004]})

# df1
  unique_value     Status  Price
0       xyz123     active   6.67
1       eff987  suspended   1.75
2       efg125     active   5.77
3       xyz123        new   7.55
4       xyz123   inactive   4.55
5       eff987        new   5.55

# df2
  unique_value Price_array    ID
0       xyz123      {4.55}  1000
1       xyz985      {1.31}  1001
2       abc987      {4.56}  1002
3       eff987      {1.75}  1003
4       asd541      {8.85}  1004

setup the dataframes to be joined

# for df2, set unique_value as the index
df2.set_index('unique_value', inplace=True)

# for df1, groupby unique_value and aggregate a set onto Price
df1g = df1.groupby('unique_value').agg({'Price': set})

# join df2 and df1g
dfj = df2.join(df1g, how='outer')

# replace NaN with empty string, '', then replace '', with empty set; NaN can't be directly replace with a set
dfj[['Price_array', 'Price']] = dfj[['Price_array', 'Price']].fillna('').applymap(set)

# dfj
             Price_array      ID               Price
unique_value                                        
abc987            {4.56}  1002.0                  {}
asd541            {8.85}  1004.0                  {}
eff987            {1.75}  1003.0        {1.75, 5.55}
efg125                {}     NaN              {5.77}
xyz123            {4.55}  1000.0  {4.55, 6.67, 7.55}
xyz985            {1.31}  1001.0                  {}

use np.where and set math to update 'Price_array'

  • If 'Price' is an empty set {}
    • Use x.Price - x.Price_array
  • Otherwise
    • Use x.Price_array
  • The order of set math, matters
    • {4.56} - set() is {4.56}
    • set() - {4.56} is set()
# use np.where and set math to update Price_array
dfj['Price_array'] = dfj[['Price_array', 'Price']].apply(lambda x: np.where(len(x.Price) > 0, x.Price - x.Price_array, x.Price_array), axis=1)

# drop the Price column
dfj.drop(columns=['Price'], inplace=True)

# reset the index
dfj.reset_index(inplace=True)

# dfj
  unique_value   Price_array      ID
0       abc987        {4.56}  1002.0
1       asd541        {8.85}  1004.0
2       eff987        {5.55}  1003.0
3       efg125        {5.77}     NaN
4       xyz123  {6.67, 7.55}  1000.0
5       xyz985        {1.31}  1001.0

fill any missing 'ID' values

# extract all rows with missing ID
dfjna = dfj.loc[dfj.ID.isna()].copy()

# get the max ID value from the ID column
idm = int(dfj.ID.max())

# update all the missing ID values from a range beginning at idm+1
dfjna.ID = range(idm+1, idm+len(dfjna)+1)

# update the missing ID values in dfj with dfjna
dfj.update(dfjna)

# set the ID column as int
dfj.ID = dfj.ID.astype(int)

# display(dfj)
  unique_value   Price_array    ID
0       abc987        {4.56}  1002
1       asd541        {8.85}  1004
2       eff987        {5.55}  1003
3       efg125        {5.77}  1005
4       xyz123  {6.67, 7.55}  1000
5       xyz985        {1.31}  1001

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Conditional update of pandas dataframe from another dataframe

From Java

How to update a dataframe, from another dataframe with duplicates

From Java

How to conditionally update a dataframe, from another dataframe

From Dev

Update dataframe from another dataframe

From Dev

Python Pandas update a dataframe value from another dataframe

From Dev

Update particular values in a pandas dataframe from another dataframe

From Dev

How to update a dataframe in Pyspark with random values from another dataframe?

From Dev

Update dataframe header with values from another dataframe

From Dev

Multiple sets of duplicate records from a pandas dataframe

From Java

How to create a pandas DataFrame from sets of items that includes all combinations of the sets?

From Dev

How to rename pandas dataframe column with another dataframe?

From Dev

update a column value in a dataframe from another matching column in different dataframe in Pandas

From Dev

Pandas: Update Multiple Dataframe Columns Using Duplicate Rows From Another Dataframe

From Dev

How to update a dataframe in Pandas Python

From Java

How to update a pandas dataframe, from multiple API calls?

From Dev

how to update multiple rows in ms access from pandas dataframe

From Dev

How do I copy a row from one pandas dataframe to another pandas dataframe?

From Dev

Adding A Specific Column from a Pandas Dataframe to Another Pandas Dataframe

From Dev

Conversion of list to sets in pandas dataframe

From Dev

Replacing Columns from one dataframe with columns from another dataframe in pandas

From Dev

Replacing Columns from one dataframe with columns from another dataframe in pandas

From Dev

In Pandas, how can I patch a dataframe with missing values with values from another dataframe given a similar index?

From Dev

How to slice pandas DataFrame based on values from another Dataframe without using for-loop?

From Dev

Replace data from one pandas dataframe to another

From Java

Pandas - map values from one dataframe to another

From Dev

Pandas: multiply column by column from another dataframe?

From Java

pandas: modifying values in dataframe from another column

From Dev

pandas multiindex assignment from another dataframe

From Dev

Join on multiple columns from another pandas dataframe

Related Related

  1. 1

    Conditional update of pandas dataframe from another dataframe

  2. 2

    How to update a dataframe, from another dataframe with duplicates

  3. 3

    How to conditionally update a dataframe, from another dataframe

  4. 4

    Update dataframe from another dataframe

  5. 5

    Python Pandas update a dataframe value from another dataframe

  6. 6

    Update particular values in a pandas dataframe from another dataframe

  7. 7

    How to update a dataframe in Pyspark with random values from another dataframe?

  8. 8

    Update dataframe header with values from another dataframe

  9. 9

    Multiple sets of duplicate records from a pandas dataframe

  10. 10

    How to create a pandas DataFrame from sets of items that includes all combinations of the sets?

  11. 11

    How to rename pandas dataframe column with another dataframe?

  12. 12

    update a column value in a dataframe from another matching column in different dataframe in Pandas

  13. 13

    Pandas: Update Multiple Dataframe Columns Using Duplicate Rows From Another Dataframe

  14. 14

    How to update a dataframe in Pandas Python

  15. 15

    How to update a pandas dataframe, from multiple API calls?

  16. 16

    how to update multiple rows in ms access from pandas dataframe

  17. 17

    How do I copy a row from one pandas dataframe to another pandas dataframe?

  18. 18

    Adding A Specific Column from a Pandas Dataframe to Another Pandas Dataframe

  19. 19

    Conversion of list to sets in pandas dataframe

  20. 20

    Replacing Columns from one dataframe with columns from another dataframe in pandas

  21. 21

    Replacing Columns from one dataframe with columns from another dataframe in pandas

  22. 22

    In Pandas, how can I patch a dataframe with missing values with values from another dataframe given a similar index?

  23. 23

    How to slice pandas DataFrame based on values from another Dataframe without using for-loop?

  24. 24

    Replace data from one pandas dataframe to another

  25. 25

    Pandas - map values from one dataframe to another

  26. 26

    Pandas: multiply column by column from another dataframe?

  27. 27

    pandas: modifying values in dataframe from another column

  28. 28

    pandas multiindex assignment from another dataframe

  29. 29

    Join on multiple columns from another pandas dataframe

HotTag

Archive