I am attempting to update the first N rows in a multi-index dataframe but was having a bit of trouble finding a solution so thought I'd create a post for it.
The example code is as follows:
# Imports
import numpy as np
import pandas as pd
# Set Up Data Frame
dates = pd.date_range('1/1/2000', periods=8)
df = pd.DataFrame(np.random.randn(8, 4), columns=['A', 'B', 'C', 'D'])
df['DATE'] = dates
df['CATEGORY'] = ['A','B','A','B','A','B','A','B']
# Set Index
df.set_index(['CATEGORY','DATE'],inplace=True)
df.sort(inplace=True)
# Get First Two Rows of Each Category
df.groupby(level=0).apply(lambda x: x.iloc[0:2])
# Set The Value of Column 'C' Equal to Zero
# ???
So I was able to get as far as selecting the rows using "iloc", but after that I'm not sure how to set column "C" equal to zero. Feels like maybe I'm going about this the wrong way though. Any help would be greatly appreciated. Thanks!
How about this - first define a function that takes a dataframe, and replaces the first x records with a specified value.
def replace_first_x(group_df, x, value):
group_df.iloc[:x, :] = value
return group_df
Then, pass that into the groupby
object with apply.
In [97]: df.groupby(level=0).apply(lambda df: replace_first_x(df, 2, 9999))
Out[97]:
A B C D
CATEGORY DATE
A 2000-01-01 9999.000000 9999.000000 9999.000000 9999.000000
2000-01-03 9999.000000 9999.000000 9999.000000 9999.000000
2000-01-05 1.590503 0.948911 -0.268071 0.622280
2000-01-07 -0.493866 1.222231 0.125037 0.071064
B 2000-01-02 9999.000000 9999.000000 9999.000000 9999.000000
2000-01-04 9999.000000 9999.000000 9999.000000 9999.000000
2000-01-06 1.663430 -1.170716 2.044815 -2.081035
2000-01-08 1.593104 0.108531 -1.381218 -0.517312
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments