pandas groupby apply is really slow

user5406764

When I call df.groupby([...]).apply(lambda x: ...) the performance is horrible. Is there a faster / more direct way to do this simple query?

To demonstrate my point, here is some code to set up the DataFrame:

import pandas as pd

df = pd.DataFrame(data=
    {'ticker': ['AAPL','AAPL','AAPL','IBM','IBM','IBM'],
       'side': ['B','B','S','S','S','B'],
       'size': [100, 200, 300, 400, 100, 200],
      'price': [10.12, 10.13, 10.14, 20.3, 20.2, 20.1]})


    price   side     size   ticker
0   10.12   B        100    AAPL
1   10.13   B        200    AAPL
2   10.14   S        300    AAPL
3   20.30   S        400    IBM
4   20.20   S        100    IBM
5   20.10   B        200    IBM

Now here is the part that is extremely slow that I need to speed up:

%timeit avgpx = df.groupby(['ticker','side']) \
.apply(lambda group: (group['size'] * group['price']).sum() / group['size'].sum())

3.23 ms ± 148 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

This produces the correct result but as you can see above, takes super long (3.23ms doesn't seem like much but this is only 6 rows... When I use this on a real dataset it takes forever).

ticker  side
AAPL    B       10.126667
        S       10.140000
IBM     B       20.100000
        S       20.280000
dtype: float64
cs95

You can save some time by precomputing the product and getting rid of the apply.

df['scaled_size'] = df['size'] * df['price']
g = df.groupby(['ticker', 'side'])

g['scaled_size'].sum() / g['size'].sum()

ticker  side
AAPL    B       10.126667
        S       10.140000
IBM     B       20.100000
        S       20.280000
dtype: float64
100 loops, best of 3: 2.58 ms per loop

Sanity Check

df.groupby(['ticker','side']).apply(
    lambda group: (group['size'] * group['price']).sum() / group['size'].sum())

ticker  side
AAPL    B       10.126667
        S       10.140000
IBM     B       20.100000
        S       20.280000
dtype: float64
100 loops, best of 3: 5.02 ms per loop

Getting rid of apply appears to result in a 2X speedup on my machine.

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]

編集
0

コメントを追加

0

関連記事

分類Dev

Chaining groupby and apply pandas

分類Dev

pandas groupby.apply to pyspark

分類Dev

Apply multiple if/else statement to groupby object in pandas

分類Dev

Can't add a column by using pandas groupby.apply

分類Dev

pandas groupby create a new dataframe with label from apply operation

分類Dev

Groupby MultiIndex and apply dot product to each group in pandas

分類Dev

AWS RDS SQL Server failover really slow

分類Dev

apply a function to a groupby function

分類Dev

引数付きでPandas groupby()+ apply()を使用する

分類Dev

Pandas.groupby.apply()のメモリリーク?

分類Dev

PANDAS: How to access keys of groupby object when attempting to apply multiple functions

分類Dev

How to use groupby().apply() instead of running loop on whole dataset in Python Pandas?

分類Dev

Pandas Date Groupby&Apply-パフォーマンスの向上

分類Dev

Apply function too slow in r

分類Dev

Wait for $scope to really apply changes in html

分類Dev

Python Pandas、.groupby()。apply()のグループから行をスライスします

分類Dev

Pandas groupby / applyは、int型とstring型で異なる動作をします

分類Dev

pandas.DataFrame.groupby.apply()の後に列の名前を変更します

分類Dev

Select top 10 authors with most articles published in MySQL really slow

分類Dev

Is Kivy widgets creation really so slow or am I doing it wrong?

分類Dev

I keep getting certificate errors and websites not loading correctly or really slow

分類Dev

How to get around slow groupby for a sparse matrix?

分類Dev

python pandas groupby / apply:apply関数に正確に渡されるものは何ですか?

分類Dev

Pandas Groupby&Pivot

分類Dev

Pandas Groupby&Pivot

分類Dev

Pandas Multiindex Groupby on Columns

分類Dev

Pandas Multiindex Groupby on Columns

分類Dev

Groupby in pandas dataframe

分類Dev

GroupByとCutin Pandas

Related 関連記事

  1. 1

    Chaining groupby and apply pandas

  2. 2

    pandas groupby.apply to pyspark

  3. 3

    Apply multiple if/else statement to groupby object in pandas

  4. 4

    Can't add a column by using pandas groupby.apply

  5. 5

    pandas groupby create a new dataframe with label from apply operation

  6. 6

    Groupby MultiIndex and apply dot product to each group in pandas

  7. 7

    AWS RDS SQL Server failover really slow

  8. 8

    apply a function to a groupby function

  9. 9

    引数付きでPandas groupby()+ apply()を使用する

  10. 10

    Pandas.groupby.apply()のメモリリーク?

  11. 11

    PANDAS: How to access keys of groupby object when attempting to apply multiple functions

  12. 12

    How to use groupby().apply() instead of running loop on whole dataset in Python Pandas?

  13. 13

    Pandas Date Groupby&Apply-パフォーマンスの向上

  14. 14

    Apply function too slow in r

  15. 15

    Wait for $scope to really apply changes in html

  16. 16

    Python Pandas、.groupby()。apply()のグループから行をスライスします

  17. 17

    Pandas groupby / applyは、int型とstring型で異なる動作をします

  18. 18

    pandas.DataFrame.groupby.apply()の後に列の名前を変更します

  19. 19

    Select top 10 authors with most articles published in MySQL really slow

  20. 20

    Is Kivy widgets creation really so slow or am I doing it wrong?

  21. 21

    I keep getting certificate errors and websites not loading correctly or really slow

  22. 22

    How to get around slow groupby for a sparse matrix?

  23. 23

    python pandas groupby / apply:apply関数に正確に渡されるものは何ですか?

  24. 24

    Pandas Groupby&Pivot

  25. 25

    Pandas Groupby&Pivot

  26. 26

    Pandas Multiindex Groupby on Columns

  27. 27

    Pandas Multiindex Groupby on Columns

  28. 28

    Groupby in pandas dataframe

  29. 29

    GroupByとCutin Pandas

ホットタグ

アーカイブ