我有一个DataFrame,其中“ B”是类别,“男孩”是事件,对于Boy {1,2,3,4}分配了B = 1;男孩= 1使用B表示10分钟,从12:00开始到End = 12:10,下一个男孩应该在End_Time [0]中使用它,就像B = 1一样,有四个样本,B = 2个不同的4个样本
输入样本:
B Boy Start End Out
1 1 12:00 12:10 0:10
1 2 12:01 12:11 0:10
1 3 12:02 12:12 0:10
1 4 12:03 12:13 0:10
2 5 12:00 12:10 0:05
2 6 12:01 12:11 0:05
2 7 12:02 12:12 0:05
2 8 12:03 12:13 0:05
3 9 12:00 12:10 0:03
3 10 12:01 12:11 0:03
3 11 12:02 12:12 0:03
3 12 12:03 12:13 0:03
尝试过的代码:
data_1['End'] = pd.to_datetime(data_1['Start'] + pd.to_timedelta(data_1['Out'])
for i in range(1, len(data_1)):
data_1.loc[i, 'Start'] = data_1.loc[i-1, 'End']
输出:
B Boy Start End Out
1 1 12:00 12:10 0:10
1 2 12:10 12:20 0:10
1 3 12:20 12:30 0:10
1 4 12:30 12:40 0:10
2 5 12:40 12:45 0:05
2 6 12:45 12:50 0:05
2 7 12:50 12:55 0:05
2 8 12:55 13:00 0:05
3 9 13:00 13:03 0:03
3 10 13:03 13:06 0:03
3 11 13:06 13:09 0:03
3 12 13:09 13:12 0:03
代码失败:
new_Start_time = []
for i,item in data_1.groupby('B'):
temp_list = [item.iloc[0,2]]
list_all = [item.iloc[0,3]]
for j in range(len(list_all)):
temp_list[j+1] = [list_all[j] for i in range(len(list_all) - 1) ]
temp_list.append(temp_list[j])
new_Start_time.extend(temp_list)
data_1['new_Start_time'] = new_Start_time
错误:IndexError:列表分配索引超出范围
预期结果 :
B Boy Start End Out
1 1 12:00 12:10 0:10
1 2 12:10 12:20 0:10
1 3 12:20 12:30 0:10
1 4 12:30 12:40 0:10
2 5 12:00 12:05 0:05
2 6 12:05 12:10 0:05
2 7 12:10 12:15 0:05
2 8 12:15 12:20 0:05
3 9 12:00 12:03 0:03
3 10 12:03 12:06 0:03
3 11 12:06 12:09 0:03
3 12 12:09 12:12 0:03
提前致谢
我找到了解决方案。如果您的桌子真的很大,那不是最好的选择,但是它可以工作。首先,我将列转换为datetime和timedelta:
df["Start"] = pd.to_datetime(df["Start"], format='%H:%M')
df["End"] = pd.to_datetime(df["End"], format='%H:%M')
df["Out"] = pd.to_timedelta("0"+df["Out"]+":00")
然后,代码创建新的开始和结束列:
new_start =[]
new_end = []
for i, group in df.groupby("B"):
temp_start =[]
temp_end = []
out = group.iloc[0,4]
for j in range(0,group.shape[0]):
if j==0:
temp_start.append(group.iloc[0,2])
temp_end.append(group.iloc[0,2]+out)
else:
temp_start.append(temp_end[j-1])
temp_end.append(temp_start[j]+out)
new_start.extend(temp_start)
new_end.extend(temp_end)
现在,使用新值更新旧的开始和结束列:
df["Start"]= new_start
df["End"] = new_end
df
输出:
B Boy Start End Out
0 1 1 1900-01-01 12:00:00 1900-01-01 12:10:00 00:10:00
1 1 2 1900-01-01 12:10:00 1900-01-01 12:20:00 00:10:00
2 1 3 1900-01-01 12:20:00 1900-01-01 12:30:00 00:10:00
3 1 4 1900-01-01 12:30:00 1900-01-01 12:40:00 00:10:00
4 2 5 1900-01-01 12:00:00 1900-01-01 12:05:00 00:05:00
5 2 6 1900-01-01 12:05:00 1900-01-01 12:10:00 00:05:00
6 2 7 1900-01-01 12:10:00 1900-01-01 12:15:00 00:05:00
7 2 8 1900-01-01 12:15:00 1900-01-01 12:20:00 00:05:00
8 3 9 1900-01-01 12:00:00 1900-01-01 12:03:00 00:03:00
9 3 10 1900-01-01 12:03:00 1900-01-01 12:06:00 00:03:00
10 3 11 1900-01-01 12:06:00 1900-01-01 12:09:00 00:03:00
11 3 12 1900-01-01 12:09:00 1900-01-01 12:12:00 00:03:00
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句