I have a Pandas dataframe where I have designated some of the columns as indices:
planets_dataframe.set_index(['host','name'], inplace=True)
and would like to be able to refer to these indices in a variety of contexts. Using the name of an index works fine in queries
planets_dataframe.query('host == "PSR 1257 12"')
but results in an error if try to use it to get a list of the values of an index as I could when it was a column
planets_dataframe.name
#AttributeError: 'DataFrame' object has no attribute 'name'
or to use it to list results as I could when it was a "regular" column
planets_dataframe.query('30 > mass > 20 and discoveryyear > 2009')['name']
#KeyError: u'no item named name'
How do I refer to the "columns" of the dataframe that I'm using as indexes?
Before set_index
:
planets_dataframe.columns
# Index([u'name', u'lastupdate', u'temperature', u'semimajoraxis', u'discoveryyear', u'calculated', u'period', u'age', u'mass', u'host', u'verification', u'transittime', u'eccentricity', u'radius', u'discoverymethod', u'inclination'], dtype='object')
After set_index
:
planets_dataframe.columns
#Index([u'lastupdate', u'temperature', u'semimajoraxis', u'discoveryyear', u'calculated', u'period', u'age', u'mass', u'verification', u'transittime', u'eccentricity', u'radius', u'discoverymethod', u'inclination'], dtype='object')
I think you have a slight misunderstanding of what indexes are. You don't just "designate" columns as indexes; that is, you don't just "tag" certain columns with info that says "this is an index". The index is a separate data structure that can hold data that aren't even present in the columns. If you do set_index
, you move those columns into the index, so they no longer exist as regular columns. This is why you can no longer use them in the ways you mention: they aren't there anymore.
One thing you can do is, when using set_index
, pass drop=False
to tell it to keep the columns as columns in addition to putting them in the index (effectively copying them to the index rather than moving them), e.g., df.set_index('SomeColumn', drop=False)
. However, you should be aware that the index and column are still distinct, so for instance if you modify the column values this will not affect what's stored in the index.
The upshot is that indexes aren't really columns of the DataFrame, so if you want to be able to use some data as both an index and a column, you need to duplicate it in both places. There is some discussion of this issue here.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments