DataFrame To user defined Format

Edwin Baby Published at Dev

Edwin Baby

I have a dataframe

name  salary department              position
   a   25000          x       normal employee
   b   50000          y       normal employee
   c   10000          y  experienced employee
   d   20000          x  experienced employee

I would like to get the result like the format below:

dept  total salary  salary_percentage count_normal_employee      count_experienced_employee
x      55000           55000/115000                 1                              1
y      60000           60000/115000                 1                              1

jezrael

You can use pivot_table with fillna for df1, groupby with sum, divide new column total salary with sum of original column salary for df2 and last merge:

#pivot df, fill NaN by 0
df1 = df.pivot_table(index='department', columns='position', values='name', aggfunc='count').fillna(0).reset_index()
#reset column name - for nicer df 
df1.columns.name = None
print df1
  department  experienced employee  normal employee
0          x                     1                1
1          y                     1                1

#sum by groups by column department and rename column salary
df2 = df.groupby('department')['salary'].sum().reset_index().rename(columns={'salary':'total salary'})

df2['salary_percentage'] = df2['total salary'] / df['salary'].sum() 
print df2
  department  total salary  salary_percentage
0          x         45000           0.428571
1          y         60000           0.571429

print pd.merge(df1, df2, on=['department'])
  department  experienced employee  normal employee  total salary  \
0          x                     1                1         45000   
1          y                     1                1         60000   

   salary_percentage  
0           0.428571  
1           0.571429

Collected from the Internet

Please contact [email protected] to delete if infringement.