How can I calculate number of consecutive values in a column within a group in a pandas dataframe?

Joe Published at Dev

Joe

I have a dataframe with all of a fighter's fights, the fight number (i.e. if it is their first, second, etc.), and whether or not they won the fight. I would like to calculate the number of consecutive wins a fighter had gotten before their current fight (i.e. not including if they won the current fight). I am currently working with Python 3.7 in Spyder.

Suppose we have the following dataframe, where win = 1 if the fighter won the fight:

df = pd.DataFrame({'fighter' : ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'C', 'C'], 
                  'fight_number' :  ['1', '2', '3', '4', '1', '2', '3', '1', '2'],
                  'win' : [0, 0, 1, 1, 1, 1, 0, 1, 1]})

  fighter  fight_number  win
0       A             1     0
1       A             2     0
2       A             3     1
3       A             4     1
4       B             1     1
5       B             2     1
6       B             3     0
7       C             1     1
8       C             2     1

I know that to calculate win streaks across all rows, I can implement the solution proposed here with:

grouper = (df.win != df.win.shift()).cumsum()
df['streak'] = df.groupby(grouper).cumsum()

which produces:

  fighter fight_number  win  streak
0       A            1    0       0
1       A            2    0       0
2       A            3    1       1
3       A            4    1       2
4       B            1    1       3
5       B            2    1       4
6       B            3    0       0
7       C            1    1       1
8       C            2    1       2

But what I need is to apply this approach to subgroups of the dataframe (i.e. to each fighter) and to not include the outcome of the current fight in the count of the streak. So, I am basically trying to have the current win streak of the fighter when they enter the fight.

The target output in this example would therefore be:

  fighter fight_number  win  streak
0       A            1    0       0
1       A            2    0       0
2       A            3    1       0
3       A            4    1       1
4       B            1    1       0
5       B            2    1       1
6       B            3    0       2
7       C            1    1       0
8       C            2    1       1

I appreciate any advice I can get on this, as I am pretty new to Python.

Joe

One solution I came up with was inspired by an earlier answer posted (but deleted) by jezrael:

grouper = (df.win != df.win.shift()).cumsum()
df['streak'] = df.groupby(['fighter', grouper]).cumsum()
df['streak'] = df.groupby('fighter')['streak'].shift(1).fillna(0)

which produces the target output:

  fighter fight_number  win  streak
0       A            1    0     0.0
1       A            2    0     0.0
2       A            3    1     0.0
3       A            4    1     1.0
4       B            1    1     0.0
5       B            2    1     1.0
6       B            3    0     2.0
7       C            1    1     0.0
8       C            2    1     1.0

and it also seems to work on other test examples:

df2 = pd.DataFrame({'fighter' : ['A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'C', 'C'], 
                  'fight number' :  ["1", "2", "3", "4", "5", "6", "1", "2", "3", "1", "2"],
                  'win' : [1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1]}) 

grouper = (df2.win != df2.win.shift()).cumsum()
df2['streak'] = df2.groupby(['fighter', grouper]).cumsum()
df2['streak'] = df2.groupby('fighter')['streak'].shift(1).fillna(0)

   fighter fight number  win  streak
0        A            1    1     0.0
1        A            2    1     1.0
2        A            3    0     2.0
3        A            4    1     0.0
4        A            5    0     1.0
5        A            6    1     0.0
6        B            1    1     0.0
7        B            2    1     1.0
8        B            3    0     2.0
9        C            1    1     0.0
10       C            2    1     1.0

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-05-22

Comments

0 comments

From Dev

In a Pandas dataframe, how can I extract the difference between the values on separate rows within the same column, conditional on a second column?

From Dev

How to calculate ratio of values in a pandas dataframe column?

From Dev

How can I group by the count number of column values and sort it?

From Dev

How can I group by elements based on multiple columns in pandas dataframe and save the number of elements of each group in another column?

From Dev

How can I create a legend in ggplot which assigns names and colors to columns and not to values within a column of a dataframe?

Related Related

Article