我有以下for循环功能:
def add_CQI_iterrows(df):
previous_row = df['Date'].astype(str)[0]
CQI_index = 0
series = []
for index, row in df.iterrows():
if row['Date'] == previous_row:
previous_row = row['Date']
print(CQI_index)
else:
CQI_index += 1
previous_row = row['Date']
series.append(CQI_index)
df['CQI'] = series
return df
我想找到一种方法将此for循环转换为apply方法。像这样的东西(不起作用):
def add_CQI_apply(df):
previous_row = df['Date'].astype(str)[0]
CQI_index = 1
series = []
df['CQI'] = df.apply(lambda row: previous_row = row['Date'] if row['Date'] == previous_row else CQI_index += 1 and previous_row = row['Date'], axis=1)
return df
我想进行此转换,是因为我想了解一下apply方法的运行速度,以及是否有可能对Pandas系列进行apply方法的矢量化。
这是我的数据(data.json):
[
{
"Date": "9/20/2020 8:50",
"UE": 1
},
{
"Date": "9/20/2020 8:50",
"UE": 2
},
{
"Date": "9/20/2020 8:50",
"UE": 3
},
{
"Date": "9/20/2020 8:57",
"UE": 1
},
{
"Date": "9/20/2020 8:57",
"UE": 8
},
{
"Date": "9/20/2020 8:57",
"UE": 2
},
{
"Date": "9/20/2020 9:12",
"UE": 1
},
{
"Date": "9/20/2020 9:12",
"UE": 5
},
{
"Date": "9/20/2020 9:12",
"UE": 3
},
{
"Date": "9/20/2020 9:20",
"UE": 1
},
{
"Date": "9/20/2020 9:20",
"UE": 4
},
{
"Date": "9/20/2020 9:20",
"UE": 3
}
]
最后是上传数据的功能:
def upload_data(file):
df = pd.read_json(file)
df['Date'] = pd.to_datetime(df['Date'], format="%Y-%d-%m %H:%M:%S")
df['CQI'] = np.nan
return df
df['CQI'] = (df['Date'] != df['Date'].shift()).cumsum()
In [120]: (df['Date'] != df['Date'].shift()).cumsum()
Out[120]:
0 1
1 1
2 1
3 2
4 2
5 2
6 3
7 3
8 3
9 4
10 4
11 4
Name: Date, dtype: int64
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句