I have the below Pandas dataframe. First column is date in YYYY-MM-DD format. It has month on month data but month starting may not be 1st necessarily and month last might not necessarily be 31 or 30 and not 29 or 28 incase of feb. It might vary. For example Feb 2020 has data from 2020-02-03 only and the last available data for feb is 2020-02-28 (not the 29th).
Date start_Value end_value
2020-01-01 115 120
2020-01-02 122 125
2020-01-03 125.2 126
...
2020-01-31 132 135
2020-02-03 135.5 137
2020-02-04 137.8 138
...
2020-02-28 144 145
My objective is to create a new column which calculates the percentage difference between the end value of the previous month's last available date in the dataframe and end value of next month's last available date in dataframe. It should be 0 for all the dates except the last available date for the month. For Jan 2020, since we dont have the previous month data, the percentage difference should be calculated using the end value of the first available date for the month.
For Jan 2020, the percentage difference will be calculated between end value of 2020-01-01 and end value on 2020-01-31. For the rest (for example from Feb 2020: the percentage difference is calculated between end value on 2020-01-31 and end value on 2020-02-28).
Date start_Value end_value percentage difference
2020-01-01 115 120 0
2020-01-02 122 125 0
2020-01-03 125.2 126 0
...
2020-01-31 132 135 17.4
2020-02-03 135.5 137 0
2020-02-04 137.8 138 0
...
2020-02-28 144 145 7.41
how to achieve this in python and pandas?
Check with transform
with duplicated
s = df.Date.dt.strftime('%Y-%m')
df['pct']= (df.groupby(s)['end_value'].transform('last')/df.groupby(s)['start_Value'].transform('first')-1).\
mask(s.duplicated(keep='last'))
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments