在这种情况下,最好的 Pandas 应用/循环方法是什么?

克里斯托弗

我正在转换一些申请人的交易数据,我需要创建一个新的标志列(在我的示例中标记为“DESIRED FLAG”)。但是,我无法找出正确的循环/应用方法,因为下面的逻辑可能有很多不同的变化。

在一个完美的世界中,连续的申请人流程历史看起来像这样,所有“状态”都设置为“已完成”:

  • 现场面试开始->安排面试->决策;或者
  • 电话面试开始->安排面试->决定

当然,申请人在申请过程中可以通过许多电话面试和现场面试。

如下例所示,有时会取消“安排面试”。在这些情况下,我需要删除该步骤以及与之相关的后续步骤。这些包括“安排面试”、“决定”和“现场面试开始”或“电话面试开始”。此外,有时可能还有其他“事件”,就像我们在手动跳过的事件中看到的那样。

我还有其他类型的场景需要为其创建标志,因此我需要将原始数据框保留在新列中。

import pandas as pd

data = {'Employee ID': ["100","100", "100", "100","100","100","100","100","100","100","200", "200", "200","200","200","200","200","300","300", "300", "300","300","300","300"],
        'Completed On Date': ["2009-01-01","2010-01-01","2011-06-05","2012-07-01","2013-01-01","2014-01-01","2015-01-01","2016-01-01","2017-01-01","2018-01-01","2010-01-01","2011-06-05","2012-07-01","2012-08-15","2013-01-01","2014-01-01","2015-01-01","2009-01-01","2010-01-01","2011-06-05","2012-07-01","2013-01-01","2014-01-01","2015-01-01"],
        'Event': ["Decision","On-Site Interview Kick Off","Schedule Interviews","Decision","On-Site Interview Kick Off","Schedule Interviews","Decision","Phone Interview Kick Off","Schedule Interviews","Decision","On-Site Interview Kick Off","Schedule Interviews","Decision","Decision","Phone Interview Kick Off","Schedule Interviews","Decision","Job Apply","Phone Interview Kick Off","Schedule Interviews","Decision","On-Site Interview Kick Off","Schedule Interviews","Decision"],
        'Event Status': ["Completed","Completed","CANCELED","Completed","Completed","Completed","Completed","Completed","Completed","Completed","Completed","CANCELED","Manually Skipped","Completed","Completed","Completed","Completed","Completed","Completed","CANCELED","Completed","Completed","Completed","Completed"],
        'DESIRED FLAG': ["Keep","Keep","Remove","Remove","Remove","Keep","Keep","Keep","Keep","Keep","Keep","Remove","Remove","Remove","Remove","Keep","Keep","Keep","Keep","Remove","Remove","Remove","Keep","Keep"]}
df = pd.DataFrame(data, columns=['Employee ID','Completed On Date','Event','Event Status','DESIRED FLAG'])
df = df.sort_values(by=(['Employee ID','Completed On Date']))

df
Costa Huang

我认为以下代码可以解决您的问题

import pandas as pd

data = {'Employee ID': ["100","100", "100", "100","100","100","100","100","100","100","200", "200", "200","200","200","200","200","300","300", "300", "300","300","300","300"],
        'Completed On Date': ["2009-01-01","2010-01-01","2011-06-05","2012-07-01","2013-01-01","2014-01-01","2015-01-01","2016-01-01","2017-01-01","2018-01-01","2010-01-01","2011-06-05","2012-07-01","2012-08-15","2013-01-01","2014-01-01","2015-01-01","2009-01-01","2010-01-01","2011-06-05","2012-07-01","2013-01-01","2014-01-01","2015-01-01"],
        'Event': ["Decision","On-Site Interview Kick Off","Schedule Interviews","Decision","On-Site Interview Kick Off","Schedule Interviews","Decision","Phone Interview Kick Off","Schedule Interviews","Decision","On-Site Interview Kick Off","Schedule Interviews","Decision","Decision","Phone Interview Kick Off","Schedule Interviews","Decision","Job Apply","Phone Interview Kick Off","Schedule Interviews","Decision","On-Site Interview Kick Off","Schedule Interviews","Decision"],
        'Event Status': ["Completed","Completed","CANCELED","Completed","Completed","Completed","Completed","Completed","Completed","Completed","Completed","CANCELED","Manually Skipped","Completed","Completed","Completed","Completed","Completed","Completed","CANCELED","Completed","Completed","Completed","Completed"],
        'DESIRED FLAG': ["Keep","Keep","Remove","Remove","Remove","Keep","Keep","Keep","Keep","Keep","Keep","Remove","Remove","Remove","Remove","Keep","Keep","Keep","Keep","Remove","Remove","Remove","Keep","Keep"]}
df = pd.DataFrame(data, columns=['Employee ID','Completed On Date','Event','Event Status','DESIRED FLAG'])
df = df.sort_values(by=(['Employee ID','Completed On Date']))


index_list_delete = []
start_deleting = False
for i in range(0, len(df)):
    if start_deleting == False:
        # whenever I see a "CANCELED", i know some following rows need to be deleted
        if df.iloc[i]['Event Status'] == 'CANCELED':
            index_list_delete += [i]
            start_deleting = True
    else:
        # whenever i see a "Schedule Interviews", i need to stop deleting. 
        # otherwise keep track of the rows that need to be deleted
        if df.iloc[i]['Event'] == 'Schedule Interviews':
            start_deleting = False
        else:
            index_list_delete += [i]

# deleting rows
df = df.drop(df.index[index_list_delete])
# reseting index
df = df.reset_index(drop = True)

你会得到以下结果

   Employee ID Completed On Date                       Event Event Status DESIRED FLAG
0          100        2009-01-01                    Decision    Completed         Keep
1          100        2010-01-01  On-Site Interview Kick Off    Completed         Keep
2          100        2014-01-01         Schedule Interviews    Completed         Keep
3          100        2015-01-01                    Decision    Completed         Keep
4          100        2016-01-01    Phone Interview Kick Off    Completed         Keep
5          100        2017-01-01         Schedule Interviews    Completed         Keep
6          100        2018-01-01                    Decision    Completed         Keep
7          200        2010-01-01  On-Site Interview Kick Off    Completed         Keep
8          200        2014-01-01         Schedule Interviews    Completed         Keep
9          200        2015-01-01                    Decision    Completed         Keep
10         300        2009-01-01                   Job Apply    Completed         Keep
11         300        2010-01-01    Phone Interview Kick Off    Completed         Keep
12         300        2014-01-01         Schedule Interviews    Completed         Keep
13         300        2015-01-01                    Decision    Completed         Keep

本文收集自互联网,转载请注明来源。

如有侵权,请联系[email protected] 删除。

编辑于
0

我来说两句

0条评论
登录后参与评论

相关文章

来自分类Dev

为什么在这种情况下最好使用StringComparison.Ordinal?

来自分类Dev

CSS-在这种情况下最好使用什么(px,%,vw,wh或em)?

来自分类Dev

优化-在这种情况下,工会是最好的方法吗?

来自分类Dev

在这种情况下,pandas中有没有更简单的方法可以替换空值而不是循环?

来自分类Dev

在这种情况下,避免混乱循环的最佳方法是什么?

来自分类Dev

jQuery:在这种情况下,.on()的正确用法是什么?

来自分类Dev

在这种情况下,`typedef`是什么意思

来自分类Dev

在这种情况下,svn更新的流程是什么?

来自分类Dev

jQuery:在这种情况下,.on()的正确用法是什么?

来自分类Dev

在这种情况下setViewControllers的目的是什么

来自分类Dev

> =在这种情况下是什么意思

来自分类Dev

在这种情况下 res 是什么意思?

来自分类Dev

为什么在这种情况下创建循环?

来自分类Dev

在这种情况下,如何从列表理解转到循环?

来自分类Dev

在这种情况下如何使用for循环?

来自分类Dev

在这种情况下如何循环数组?

来自分类Dev

如何在这种情况下添加 for 循环?

来自分类Dev

在这种情况下,如何应用惰性量词?

来自分类Dev

在这种情况下如何应用内部联接?

来自分类Dev

在这种情况下,_参数表示什么?

来自分类Dev

为什么std :: forward在这种情况下无用

来自分类Dev

为什么在这种情况下Python比C ++快?

来自分类Dev

在这种情况下,CommandLineRunner会做什么?

来自分类Dev

为什么在这种情况下UniquelyReferencedNonObjC返回false?

来自分类Dev

为什么在这种情况下保存我的设置?

来自分类Dev

在这种情况下,溢出意味着什么?

来自分类Dev

为什么在这种情况下使用memset

来自分类Dev

在这种情况下,&操作正在做什么

来自分类Dev

在这种情况下,为什么seekp()失败?

Related 相关文章

  1. 1

    为什么在这种情况下最好使用StringComparison.Ordinal?

  2. 2

    CSS-在这种情况下最好使用什么(px,%,vw,wh或em)?

  3. 3

    优化-在这种情况下,工会是最好的方法吗?

  4. 4

    在这种情况下,pandas中有没有更简单的方法可以替换空值而不是循环?

  5. 5

    在这种情况下,避免混乱循环的最佳方法是什么?

  6. 6

    jQuery:在这种情况下,.on()的正确用法是什么?

  7. 7

    在这种情况下,`typedef`是什么意思

  8. 8

    在这种情况下,svn更新的流程是什么?

  9. 9

    jQuery:在这种情况下,.on()的正确用法是什么?

  10. 10

    在这种情况下setViewControllers的目的是什么

  11. 11

    > =在这种情况下是什么意思

  12. 12

    在这种情况下 res 是什么意思?

  13. 13

    为什么在这种情况下创建循环?

  14. 14

    在这种情况下,如何从列表理解转到循环?

  15. 15

    在这种情况下如何使用for循环?

  16. 16

    在这种情况下如何循环数组?

  17. 17

    如何在这种情况下添加 for 循环?

  18. 18

    在这种情况下,如何应用惰性量词?

  19. 19

    在这种情况下如何应用内部联接?

  20. 20

    在这种情况下,_参数表示什么?

  21. 21

    为什么std :: forward在这种情况下无用

  22. 22

    为什么在这种情况下Python比C ++快?

  23. 23

    在这种情况下,CommandLineRunner会做什么?

  24. 24

    为什么在这种情况下UniquelyReferencedNonObjC返回false?

  25. 25

    为什么在这种情况下保存我的设置?

  26. 26

    在这种情况下,溢出意味着什么?

  27. 27

    为什么在这种情况下使用memset

  28. 28

    在这种情况下,&操作正在做什么

  29. 29

    在这种情况下,为什么seekp()失败?

热门标签

归档