Pandas transform dataframe using groupby when count of a string in a column is maximum

pythondumb Published at Java

pythondumb

I have a dataframe as follows:

ID        IndentNo      Status     System_text
162       1000025418    Reject     Short Description Error
162       1000025418    Reject     Delivery date Error
162       1000025418    Accept

As we can see that for a single ID we have two different Status viz Reject and Accept.

Objective: I want this dataframe to be transformed in a way that if count of Reject is more than count of Accept, then final df should read as :

ID        IndentNo      Status           System_text
162       1000025418    Reject     Short Description Error, Deivery Date Error

When I am trying the following:

df_f = df_final.groupby(['IndentId','IndentNo'])['status'].agg(max).reset_index(name='system_message')

I am getting the following df:

ID        IndentNo      System_text
162       1000025418    Reject

What I am missing here?

Alexander

You can use value_counts to get the count of each Accept and Reject in the series.

If you have more rejects than accepts, join the System_text of all the rejects, and return this result as a single row dataframe. Otherwise, just return the original dataframe.

def transform_rejects(df):
    counts = df['Status'].value_counts().to_dict()
    if counts.get('Reject', 0) > counts.get('Accept', 0):
        desc = ', '.join(df.loc[df['Status'].eq('Reject'), 'System_text'].tolist())
        return pd.DataFrame({'Status': ['Reject'], 'System_text': [desc]})
    return df

df2 = df.groupby(['ID', 'IndentNo']).apply(transform_rejects)
df2.index = df2.index.droplevel(2)
>>> df2
                Status                                   System_text
ID  IndentNo                                                        
162 1000025418  Reject  Short Description Error, Delivery date Error

Collected from the Internet

Please contact [email protected] to delete if infringement.