I have a dataframe as follows:
ID IndentNo Status System_text
162 1000025418 Reject Short Description Error
162 1000025418 Reject Delivery date Error
162 1000025418 Accept
As we can see that for a single ID
we have two different Status
viz Reject
and Accept
.
Objective: I want this dataframe to be transformed in a way that if count of Reject
is more than count of Accept
, then final df should read as :
ID IndentNo Status System_text
162 1000025418 Reject Short Description Error, Deivery Date Error
When I am trying the following:
df_f = df_final.groupby(['IndentId','IndentNo'])['status'].agg(max).reset_index(name='system_message')
I am getting the following df
:
ID IndentNo System_text
162 1000025418 Reject
What I am missing here?
You can use value_counts
to get the count of each Accept
and Reject
in the series.
If you have more rejects than accepts, join the System_text
of all the rejects, and return this result as a single row dataframe. Otherwise, just return the original dataframe.
def transform_rejects(df):
counts = df['Status'].value_counts().to_dict()
if counts.get('Reject', 0) > counts.get('Accept', 0):
desc = ', '.join(df.loc[df['Status'].eq('Reject'), 'System_text'].tolist())
return pd.DataFrame({'Status': ['Reject'], 'System_text': [desc]})
return df
df2 = df.groupby(['ID', 'IndentNo']).apply(transform_rejects)
df2.index = df2.index.droplevel(2)
>>> df2
Status System_text
ID IndentNo
162 1000025418 Reject Short Description Error, Delivery date Error
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments