将“是”标志应用于每种类型的帐户的所有日期的所有第一个实例

debugcn 发表于 Dev

维尔齐诺的礼物

我正在尝试评估每个帐户的日期的第一个实例。例如，我想创建一个新列，该列表示同一帐户的最早日期（即使重复相同的日期）为“是”，而以后的日期均表示为“否”。

test = pd.DataFrame({'date':['2018-08-01','2018-08-01','2018-08-02','2018-08-03','2019-09-01','2019-09-02','2019-09-03','2020-01-02','2020-01-03','2020-01-04','2020-10-04','2020-10-05'],
                     'account':['a','a','a','a','b','b','b','c','c','c','d','e']})
test

输出应如下所示：

我看到我正在执行的这些步骤不起作用，因为当返回“否”时，第一行索引应为“是”。

new_test = test.merge(pd.DataFrame(test.groupby('account')['date'].min().reset_index()),left_on='account',right_on='account')
new_test

first_har = []
for item in new_test['date_y'].duplicated():
    if item == False:
        first_har.append('Yes')
    else:
        first_har.append('No')
print(first_har)

pd.concat([new_test,pd.DataFrame(first_har)],axis=1)

任何帮助，不胜感激！

tegancp

您也可以使用来执行此操作merge（与迭代第一个日期的行相比，该方法的性能可能更好）。

从您生成的第一个日期的相同df开始，并first_date附加一列：

>>> first_dates = test.groupby('account')['date'].min().reset_index().assign(first_date='Yes')
>>> first_dates

  account        date first_date
0       a  2018-08-01        Yes
1       b  2019-09-01        Yes
2       c  2020-01-02        Yes
3       d  2020-10-04        Yes
4       e  2020-10-05        Yes

与原始合并：

>>> test.merge(first_dates, how='left')

          date account first_date
0   2018-08-01       a        Yes
1   2018-08-01       a        Yes
2   2018-08-02       a        NaN
3   2018-08-03       a        NaN
4   2019-09-01       b        Yes
5   2019-09-02       b        NaN
6   2019-09-03       b        NaN
7   2020-01-02       c        Yes
8   2020-01-03       c        NaN
9   2020-01-04       c        NaN
10  2020-10-04       d        Yes
11  2020-10-05       e        Yes

请注意，默认情况下，合并是在所有匹配的列上完成的，并how='left'确保结果为中的每一行提供了一行test。要通过“是/否”获得所需的输出，只需添加缺失值填充即可：

>>> test.merge(first_dates, how='left').fillna({'first_date': 'No'})

          date account first_date
0   2018-08-01       a        Yes
1   2018-08-01       a        Yes
2   2018-08-02       a         No
3   2018-08-03       a         No
4   2019-09-01       b        Yes
5   2019-09-02       b         No
6   2019-09-03       b         No
7   2020-01-02       c        Yes
8   2020-01-03       c         No
9   2020-01-04       c         No
10  2020-10-04       d        Yes
11  2020-10-05       e        Yes

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。