I am trying to apply a function to every DataFrame in a Pandas Panel. I can write it as a loop but the indexing seems to take a long time. I am hoping a builtin Pandas function might be faster.
I have data frames which look like (in reality about 50 rows per column):
mydata = pd.DataFrame( { 'hits' : [ 123, 456,678 ], 'sqerr' : [ 253, 641, 3480] } )
They are arranged in a panel with a multi-index key:
mydict = { (0, 20 ) : mydata, (30, 40 ) : moredata }
mypanel = pd.Panel( mydict )
The panel looks like this:
<class 'pandas.core.panel.Panel'>
Dimensions: 1600 (items) x 48 (major_axis) x 2 (minor_axis)
Items axis: (-4000, -4000) to (3800, 3800)
Major_axis axis: 0 to 47
Minor_axis axis: hits to sqerr
I have a function which takes a DataFrame and outputs a number:
def condenser( df ):
return some_stuff( df['hits'], df['sqerr'] )
I want to reduce my Panel to a Series, indexed by my multi-index and with results of my condenser function as its values.
I can do:
intermediate = []
for k, df in mypanel.iteritems():
intermediate.append( condenser( df ) )
result = pd.Series( results, index = pypanel.items )
which gives the required result, but when I profile it, only 4% of the time is spent in my condenser
function. Most of the time is spent in iteritems
and __getitem__
so I wondered if it could be done better.
I looked at mypanel.apply( condenser, axis = 'items' )
but this loops over each column of my DataFrames separately. Is there something which would apply a function to each DataFrame?
P.s. I am using Python 2.7.9 and pandas 0.15.2
apply is correct, but the usage is:
mypanel.apply(condenser, axis=[1,2])
This will pass a 48 x 2 DataFrame into condenser.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments