Pythonic way to calculate streaks in pandas dataframe

hernanavella

Given df

df = pd.DataFrame([[1, 5, 2, 8, 2], [2, 4, 4, 20, 2], [3, 3, 1, 20, 2], [4, 2, 2, 1, 3], [5, 1, 4, -5, -4], [1, 5, 2, 2, -20], 
              [2, 4, 4, 3, -8], [3, 3, 1, -1, -1], [4, 2, 2, 0, 12], [5, 1, 4, 20, -2]],
             columns=['A', 'B', 'C', 'D', 'E'], index=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

Based on this answer, I created a function to calculate streaks (up, down).

def streaks(df, column):
    #Create sign column
    df['sign'] = 0
    df.loc[df[column] > 0, 'sign'] = 1
    df.loc[df[column] < 0, 'sign'] = 0
    # Downstreak
    df['d_streak2'] = (df['sign'] == 0).cumsum()
    df['cumsum'] = np.nan
    df.loc[df['sign'] == 1, 'cumsum'] = df['d_streak2']
    df['cumsum'] = df['cumsum'].fillna(method='ffill')
    df['cumsum'] = df['cumsum'].fillna(0)
    df['d_streak'] = df['d_streak2'] - df['cumsum']
    df.drop(['d_streak2', 'cumsum'], axis=1, inplace=True)
    # Upstreak
    df['u_streak2'] = (df['sign'] == 1).cumsum()
    df['cumsum'] = np.nan
    df.loc[df['sign'] == 0, 'cumsum'] = df['u_streak2']
    df['cumsum'] = df['cumsum'].fillna(method='ffill')
    df['cumsum'] = df['cumsum'].fillna(0)
    df['u_streak'] = df['u_streak2'] - df['cumsum']
    df.drop(['u_streak2', 'cumsum'], axis=1, inplace=True)
    del df['sign']
    return df

The function works well, however is very long. I'm sure there's a much betterway to write this. I tried the other answer in but didn't work well.

This is the desired output

streaks(df, 'E')


    A   B   C    D     E    d_streak    u_streak
1   1   5   2    8     2         0.0    1.0
2   2   4   4   20     2         0.0    2.0
3   3   3   1   20     2         0.0    3.0
4   4   2   2    1     3         0.0    4.0
5   5   1   4   -5    -4         1.0    0.0
6   1   5   2    2   -20         2.0    0.0
7   2   4   4    3    -8         3.0    0.0
8   3   3   1   -1    -1         4.0    0.0
9   4   2   2    0    12         0.0    1.0
10  5   1   4   20    -2         1.0    0.0
Nickil Maveli

You could simplify the function as shown:

def streaks(df, col):
    sign = np.sign(df[col])
    s = sign.groupby((sign!=sign.shift()).cumsum()).cumsum()
    return df.assign(u_streak=s.where(s>0, 0.0), d_streak=s.where(s<0, 0.0).abs())

Using it:

streaks(df, 'E')

enter image description here


Firstly, compute the sign of each cell present in the column under consideration using np.sign. These assign +1 to positive numbers and -1 to the negative.

Next, identify sets of adjacent values (comparing current cell and it's next) using sign!=sign.shift() and take it's cumulative sum which would serve in the grouping process.

Perform groupby letting these as the key/condition and again take the cumulative sum across the sub-group elements.

Finally, assign the positive computed cumsum values to ustreak and the negative ones (absolute value after taking their modulus) to dstreak.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

finding streaks in pandas dataframe

From Dev

pythonic way to parse/split URLs in a pandas dataframe

From Dev

more pythonic way - pandas dataframe manipulation

From Dev

A Pythonic way to reshape Pandas.DataFrame's

From Dev

pandas - Pythonic way to slicing DataFrame with DateTimeIndex

From Python

Efficiently finding consecutive streaks in a pandas DataFrame column?

From Dev

Efficient/Pythonic way to Filter pandas DataFrame based on priority

From Dev

Efficient/Pythonic way to create lists from pandas Dataframe column

From Java

Pythonic way for calculating length of lists in pandas dataframe column

From Dev

Pythonic way to use an 'slicer' and a 'where'-equivalent on a pandas dataframe

From Dev

Pythonic way to convert Pandas dataframe from wide to long

From Dev

Pythonic way of obtaining serial correlation of elements in pandas dataframe

From Dev

Pandas dataframe, each cell into list - more pythonic way?

From Dev

Remove nans from lists in all columsn of a pandas dataframe (pythonic way)

From Dev

Pythonic way to regroup a pandas dataframe using max of a column

From Dev

What is the pythonic way to calculate argmax?

From Dev

Replacing values in dataframe in a Pythonic way

From Python

Pandas: split string in a pythonic way

From Dev

Pythonic way to group by a pandas table

From Dev

Pythonic way to calculate the mean and variance of values in Counters

From Dev

Write way to calculate weighted average with pandas Dataframe with mean or directly multiply

From Dev

Fastest way to calculate the shortest (euclidean) distance between points, in pandas dataframe

From Dev

Is there a pythonic way of shifting pandas dataframe cells to the left, while pushing out or overwriting any nan?

From Dev

Pythonic way of calculating difference between nth and n-1th value in a large dataframe using Pandas?

From Dev

Pythonic way to get rows AND columns in Pandas Dataframe which contains NaN values

From Dev

Pythonic way to apply different aggregate functions to different columns of a pandas dataframe? And to name the columns efficiently?

From Python

Pythonic way to filter a pandas dataframe to only records that were active between two dates

From Dev

What's the Pythonic way to iterate over two consecutive rows in a dataframe of Pandas?

From Dev

Pythonic way to create pandas dataframe based on if-else condition for nd-array and 1d-array

Related Related

  1. 1

    finding streaks in pandas dataframe

  2. 2

    pythonic way to parse/split URLs in a pandas dataframe

  3. 3

    more pythonic way - pandas dataframe manipulation

  4. 4

    A Pythonic way to reshape Pandas.DataFrame's

  5. 5

    pandas - Pythonic way to slicing DataFrame with DateTimeIndex

  6. 6

    Efficiently finding consecutive streaks in a pandas DataFrame column?

  7. 7

    Efficient/Pythonic way to Filter pandas DataFrame based on priority

  8. 8

    Efficient/Pythonic way to create lists from pandas Dataframe column

  9. 9

    Pythonic way for calculating length of lists in pandas dataframe column

  10. 10

    Pythonic way to use an 'slicer' and a 'where'-equivalent on a pandas dataframe

  11. 11

    Pythonic way to convert Pandas dataframe from wide to long

  12. 12

    Pythonic way of obtaining serial correlation of elements in pandas dataframe

  13. 13

    Pandas dataframe, each cell into list - more pythonic way?

  14. 14

    Remove nans from lists in all columsn of a pandas dataframe (pythonic way)

  15. 15

    Pythonic way to regroup a pandas dataframe using max of a column

  16. 16

    What is the pythonic way to calculate argmax?

  17. 17

    Replacing values in dataframe in a Pythonic way

  18. 18

    Pandas: split string in a pythonic way

  19. 19

    Pythonic way to group by a pandas table

  20. 20

    Pythonic way to calculate the mean and variance of values in Counters

  21. 21

    Write way to calculate weighted average with pandas Dataframe with mean or directly multiply

  22. 22

    Fastest way to calculate the shortest (euclidean) distance between points, in pandas dataframe

  23. 23

    Is there a pythonic way of shifting pandas dataframe cells to the left, while pushing out or overwriting any nan?

  24. 24

    Pythonic way of calculating difference between nth and n-1th value in a large dataframe using Pandas?

  25. 25

    Pythonic way to get rows AND columns in Pandas Dataframe which contains NaN values

  26. 26

    Pythonic way to apply different aggregate functions to different columns of a pandas dataframe? And to name the columns efficiently?

  27. 27

    Pythonic way to filter a pandas dataframe to only records that were active between two dates

  28. 28

    What's the Pythonic way to iterate over two consecutive rows in a dataframe of Pandas?

  29. 29

    Pythonic way to create pandas dataframe based on if-else condition for nd-array and 1d-array

HotTag

Archive