I have the following grouped dataframe based on daily data
Studentid Year Month BookLevel
JSmith 2015 12 1.4
2016 1 1.6
2 1.8
3 1.2
4 2.0
MBrown 2016 1 3.0
2 3.2
3 3.6
I want to calculate the difference from the starting point in BookLevel for each Studentid. The current BookLevel is a .max calculation from the GroupBy to get the highest bookLevel for each month for each student
What I am looking for is something like this:
Studentid Year Month BookLevel Progress Since Start
JSmith 2015 12 1.4 0 (or NAN)
2016 1 1.6 .2
2 1.8 .4
3 1.2 -.2
4 2.0 .6
2016 1 3.0 0 (or NAN)
MBrown 2 3.2 .2
3 3.6 .6
I'm new to Python/Pandas and have tried a number of things and nothing comes close.
OK, this should work, if we groupby
on the first level and subtract BookLevel from the series returned by calling transform
with first
then we can add this as the new desired column:
In [47]:
df['ProgressSinceStart'] = df['BookLevel'] - df.groupby(level='Studentid')['BookLevel'].transform('first')
df
Out[47]:
BookLevel ProgressSinceStart
Studentid Year Month
JSmith 2015 12 1.4 0.0
2016 1 1.6 0.2
2 1.8 0.4
3 1.2 -0.2
4 2.0 0.6
MBrown 2016 1 3.0 0.0
2 3.2 0.2
3 3.6 0.6
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments