Most efficient way to exclude indexed rows in pandas dataframe

dkapitan

I'm relatively new to Python & pandas and am struggling with (hierachical) indexes. I've got the basics covered, but am lost with more advanced slicing and cross-sectioning.

For example, with the following dataframe

import pandas as pd
import numpy as np
data = pd.DataFrame(np.arange(9).reshape((3, 3)),
    index=pd.Index(['Ohio', 'Colorado', 'New York'], name='state'), columns=pd.Index(['one', 'two', 'three'], name='number'))

I want to select everything except the row with index 'Colorado'. For a small dataset I could do:

data.ix[['Ohio','New York']]

But if the number of unique index values is large, that's impractical. Naively, I would expect a syntax like

data.ix[['state' != 'Colorado']]

However, this only returns the first record 'Ohio' and doesn't return 'New York'. This works, but is cumbersome

filter = list(set(data.index.get_level_values(0).unique()) - set(['Colorado']))
data[filter]

Surely there's a more Pythonic, verbose way of doing this?

DSM

This is a Python issue, not a pandas one: 'state' != 'Colorado' is True, so what pandas gets is data.ix[[True]].

You could do

>>> data.loc[data.index != "Colorado"]
number    one  two  three
state                    
Ohio        0    1      2
New York    6    7      8

[2 rows x 3 columns]

or use DataFrame.query:

>>> data.query("state != 'New York'")
number    one  two  three
state                    
Ohio        0    1      2
Colorado    3    4      5

[2 rows x 3 columns]

if you don't like the duplication of data. (Quoting the expression passed to the .query() method is one of the only ways around the fact that otherwise Python would evaluate the comparison before pandas ever saw it.)

本文收集自互联网,转载请注明来源。

如有侵权,请联系[email protected] 删除。

编辑于
0

我来说两句

0条评论
登录后参与评论

相关文章

来自分类Dev

Most efficient way to SELECT rows WHERE the ID EXISTS IN a second table

来自分类Dev

Most efficient way to schedule 24 different timers

来自分类Dev

An efficient way to get dictionary from dataframe

来自分类Dev

which would be the most time efficient way to perform these nested loops?

来自分类Dev

Most efficient way to get digit count of arbitrarily big number

来自分类Dev

Most efficient way to maintain list of error codes in Java

来自分类Dev

efficient way to find several rows above and below a subset of data

来自分类Dev

What's the most efficient way to calculate a running total/balance when using pagination (PHP, MySQL)

来自分类Dev

Java: What's the most efficient way to read relatively large txt files and store its data?

来自分类Dev

What's the most efficient way to handle different classes in a hierarchy with some same properties

来自分类Dev

Is this most efficient to bubble sort a list in python?

来自分类Dev

Most efficient RowScan of very large Bigtable table

来自分类Dev

Efficient SQL that select a column with a keyword but exclude another word at the same time

来自分类Dev

Efficient modification of the rows of a sparse matrix in Java

来自分类Dev

DataFrame Split On Rows 并使用 Python Pandas 应用于标题一列

来自分类Dev

Column - Most frequent letter in a group of 4 rows

来自分类Dev

Is there any way to exclude branches from showing in GitK?

来自分类Dev

An efficient way to convert document to pdf format

来自分类Dev

Replace NA conditionally in a more efficient way

来自分类Dev

Most pythonic way to convert a string to a octal number

来自分类Dev

Pandas: dataframe to long format

来自分类Dev

Pandas DataFrame column concatenation

来自分类Dev

自然排序Pandas DataFrame

来自分类Dev

Normalizing a pandas DataFrame by row

来自分类Dev

Column to row in pandas dataframe

来自分类Dev

Pandas multiIndex DataFrame sort

来自分类Dev

Pandas DataFrame列串联

来自分类Dev

使用Pandas扩展DataFrame

来自分类Dev

plot dataframe pandas not working

Related 相关文章

  1. 1

    Most efficient way to SELECT rows WHERE the ID EXISTS IN a second table

  2. 2

    Most efficient way to schedule 24 different timers

  3. 3

    An efficient way to get dictionary from dataframe

  4. 4

    which would be the most time efficient way to perform these nested loops?

  5. 5

    Most efficient way to get digit count of arbitrarily big number

  6. 6

    Most efficient way to maintain list of error codes in Java

  7. 7

    efficient way to find several rows above and below a subset of data

  8. 8

    What's the most efficient way to calculate a running total/balance when using pagination (PHP, MySQL)

  9. 9

    Java: What's the most efficient way to read relatively large txt files and store its data?

  10. 10

    What's the most efficient way to handle different classes in a hierarchy with some same properties

  11. 11

    Is this most efficient to bubble sort a list in python?

  12. 12

    Most efficient RowScan of very large Bigtable table

  13. 13

    Efficient SQL that select a column with a keyword but exclude another word at the same time

  14. 14

    Efficient modification of the rows of a sparse matrix in Java

  15. 15

    DataFrame Split On Rows 并使用 Python Pandas 应用于标题一列

  16. 16

    Column - Most frequent letter in a group of 4 rows

  17. 17

    Is there any way to exclude branches from showing in GitK?

  18. 18

    An efficient way to convert document to pdf format

  19. 19

    Replace NA conditionally in a more efficient way

  20. 20

    Most pythonic way to convert a string to a octal number

  21. 21

    Pandas: dataframe to long format

  22. 22

    Pandas DataFrame column concatenation

  23. 23

    自然排序Pandas DataFrame

  24. 24

    Normalizing a pandas DataFrame by row

  25. 25

    Column to row in pandas dataframe

  26. 26

    Pandas multiIndex DataFrame sort

  27. 27

    Pandas DataFrame列串联

  28. 28

    使用Pandas扩展DataFrame

  29. 29

    plot dataframe pandas not working

热门标签

归档