我正在寻找在数据帧中使用组设置增量计数器。除非满足条件,否则我想增加组中每一行的计数器。如果满足条件,我想使用以前的计数。我也希望为每个组重置此设置。
例:
d1 = {'col1': [1, 1, 1, 2, 2, 3], 'col2': ['A', 'A', 'B', 'A', 'A', 'B']}
df1 = pd.DataFrame(data=d1)
df1
输出:
col1 col2
0 1 A
1 1 A
2 1 B
3 2 A
4 2 A
5 3 B
预期输出:
col1 col2 count
0 1 A 1
1 1 A 2
2 1 B 2
3 2 A 1
4 2 A 2
5 3 B 0
我尝试使用numpy cumsum。但是我不确定如何重用最后一个
编辑:希望按列1分组。
我按照我所希望的内容编写了一个代码段,如果某些内容确实与您期望的不完全相同,则可以绝对地重用它以进行适应。
我认为这里的关键是:1)对(previousRow,currentRow)对进行迭代,以便您可以轻松访问最后一行信息
2)具体条件是否符合您的期望。
3)尝试在if条件下更新计数,然后设置值
import pandas as pd
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
from itertools import zip_longest
d1 = {'col1': [1, 1, 1, 2, 2, 3], 'col2': ['A', 'A', 'B', 'A', 'A', 'B']}
df1 = pd.DataFrame(data=d1)
df1['count'] = 0
df1_previterrows = df1.iterrows()
df1_curriterrows = df1.iterrows()
df1_curriterrows.__next__()
groups_counter = {}
df1_firstRow = df1.iloc[0]
if df1_firstRow["col2"] == "A":
groups_counter[df1_firstRow['col1']]=1
df1.set_value(0, 'count', 1)
elif df1_firstRow["col2"] == "B":
groups_counter["B"]=1
df1.set_value(0, 'count', 0)
zip_list = zip_longest(df1_previterrows, df1_curriterrows)
for (prevRow_idx, prevRow), Curr in zip_list:
if not (Curr is None):
(currRow_idx, currRow) = Curr
if((currRow["col1"] == prevRow["col1"]) and (currRow["col2"] == "A")):
count = groups_counter.get(currRow["col1"],False)
if not count:
groups_counter[currRow["col1"]]=0
groups_counter[currRow["col1"]]+=1
elif((currRow["col1"] != prevRow["col1"]) and (currRow["col2"] == "A")):
groups_counter[currRow["col1"]]=1
elif((currRow["col1"] == prevRow["col1"]) and (currRow["col2"] == "B")):
if not groups_counter.get(currRow["col1"],False):
groups_counter[curr["col1"]] = 1
elif((currRow["col1"] != prevRow["col1"]) and (currRow["col2"] == "B")):
groups_counter[currRow["col1"]]=0
df1.set_value(currRow_idx, 'count', groups_counter[currRow["col1"]])
print(df1)
输出:
col1 col2 count
0 1 A 1
1 1 A 2
2 1 B 2
3 2 A 1
4 2 A 2
5 3 B 0
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句