pandas: dropping columns based on value in last row

Dmitry B.

Starting out with data like this:

np.random.seed(314)
df = pd.DataFrame({
        'date':[pd.date_range('2016-04-01', '2016-04-05')[r] for r in np.random.randint(0,5,20)], 
        'cat':['ABCD'[r] for r in np.random.randint(0,4,20)], 
        'count': np.random.randint(0,100,20)
})

   cat  count       date
0    B     84 2016-04-04
1    A     95 2016-04-05
2    D     89 2016-04-02
3    D     39 2016-04-05
4    A     39 2016-04-01
5    C     61 2016-04-05
6    C     58 2016-04-04
7    B     49 2016-04-03
8    D     20 2016-04-02
9    B     54 2016-04-01
10   B     87 2016-04-01
11   D     36 2016-04-05
12   C     13 2016-04-05
13   A     79 2016-04-04
14   B     91 2016-04-03
15   C     83 2016-04-05
16   C     85 2016-04-05
17   D     93 2016-04-01
18   C     32 2016-04-02
19   B     29 2016-04-03

Next, I calculate totals by date, pivot cat into columns, and calculate running totals for each column:

summary = df.groupby(['date','cat']).sum().unstack().fillna(0).cumsum()

cat            A    B    C   D
date
2016-04-01    80  235   99   0
2016-04-02    85  295  153  14
2016-04-03   111  363  224  14
2016-04-04   111  379  296  50
2016-04-05   111  511  296  50

Now I want to remove columns where the last column is less than some value, say 150. The result should look like:

cat          B    C 
date
2016-04-01   235   99 
2016-04-02   295  153 
2016-04-03   363  224 
2016-04-04   379  296 
2016-04-05   511  296 

I've figured out one part of it:

mask = summary[-1:].squeeze() > 150


       cat
count  A      False
       B       True
       C       True
       D      False

will give me a mask for dropping columns. What I can't figure out is how to use it with a call to summary.drop(...). Any hints?

joris

Instead of dropping the columns you do not want, you can also select the ones you want (using the mask with boolean indexing):

In [16]: mask = summary[-1:].squeeze() > 220

In [17]: summary.loc[:, mask]
Out[17]:
            count
cat             B      D
date
2016-04-01  141.0   94.0
2016-04-02  235.0   94.0
2016-04-03  235.0  144.0
2016-04-04  326.0  144.0
2016-04-05  384.0  229.0

(I used 220 instead of 150, otherwise all columns were selected)

Further, a better way to calculate the mask is probably the following:

mask = summary.iloc[-1] > 220

which just selects the last row (by position) instead of using squeeze.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

pandas: dropping columns based on value in last row

From Dev

Sorting Deedle Frame by columns based on the value of the last row

From Dev

pandas groupby dropping columns

From Dev

Subset columns based on row value

From Java

Dropping rows from pandas dataframe based on value in column(s)

From Dev

Pandas - Dropping multiple empty columns

From Dev

Dropping multiple columns in pandas at once

From Dev

Extract rows based on last value of row R

From Dev

Inserting a row into a pandas dataframe based on row value?

From Dev

Filter pandas columns based on row condition

From Dev

Filter pandas columns based on row condition

From Dev

pandas dataframe column based on row and multiple columns

From Dev

Pandas select last row value grater than

From Dev

Python pandas: Compare rows of dataframe based on some columns and drop row with lowest value

From Dev

Pandas replace the value of multiple columns based on value

From Dev

Merging multiple columns in Oracle based on row value

From Dev

How extract row based on some value in columns?

From Java

Pandas merge function based on the last value

From Dev

Get a subset of columns by row value in pandas

From Dev

Python Pandas Groupby Dropping DateTime Columns

From Dev

dropping empty columns in pandas 0.23+

From Dev

Python Pandas Groupby Dropping DateTime Columns

From Dev

Dropping a number of columns in a pandas DataFrame on one line

From Java

Dropping every row with len >2 Pandas python

From Dev

Pandas - calculate row value based on previous calculated row value

From Dev

Modifying row value based on preceding row value in pandas/python

From Dev

pandas merge dataframe based on same value in columns

From Dev

Drop rows in pandas dataframe based on columns value

From Dev

Pandas Data frame combination of columns based on value

Related Related

  1. 1

    pandas: dropping columns based on value in last row

  2. 2

    Sorting Deedle Frame by columns based on the value of the last row

  3. 3

    pandas groupby dropping columns

  4. 4

    Subset columns based on row value

  5. 5

    Dropping rows from pandas dataframe based on value in column(s)

  6. 6

    Pandas - Dropping multiple empty columns

  7. 7

    Dropping multiple columns in pandas at once

  8. 8

    Extract rows based on last value of row R

  9. 9

    Inserting a row into a pandas dataframe based on row value?

  10. 10

    Filter pandas columns based on row condition

  11. 11

    Filter pandas columns based on row condition

  12. 12

    pandas dataframe column based on row and multiple columns

  13. 13

    Pandas select last row value grater than

  14. 14

    Python pandas: Compare rows of dataframe based on some columns and drop row with lowest value

  15. 15

    Pandas replace the value of multiple columns based on value

  16. 16

    Merging multiple columns in Oracle based on row value

  17. 17

    How extract row based on some value in columns?

  18. 18

    Pandas merge function based on the last value

  19. 19

    Get a subset of columns by row value in pandas

  20. 20

    Python Pandas Groupby Dropping DateTime Columns

  21. 21

    dropping empty columns in pandas 0.23+

  22. 22

    Python Pandas Groupby Dropping DateTime Columns

  23. 23

    Dropping a number of columns in a pandas DataFrame on one line

  24. 24

    Dropping every row with len >2 Pandas python

  25. 25

    Pandas - calculate row value based on previous calculated row value

  26. 26

    Modifying row value based on preceding row value in pandas/python

  27. 27

    pandas merge dataframe based on same value in columns

  28. 28

    Drop rows in pandas dataframe based on columns value

  29. 29

    Pandas Data frame combination of columns based on value

HotTag

Archive