Split pandas dataframe index

freethrow

I have a pretty big dataframe with column names categories (foreign trade statistics), while the index is a string containing the country code AND the year: w2013 meaning World, year 2013, r2015 meaning Russian Federation, year 2015.

Index([u'w2011', u'c2011', u'g2011', u'i2011', u'r2011', u'w2012', u'c2012',
   u'g2012', u'i2012', u'r2012', u'w2013', u'c2013', u'g2013', u'i2013',
   u'r2013', u'w2014', u'c2014', u'g2014', u'i2014', u'r2014', u'w2015',
   u'c2015', u'g2015', u'i2015', u'r2015'],
  dtype='object')

What would be the easiest way to make a multiple index for plotting the various columns - I need a column plotted for each country and each year?

jezrael

You can try create Multiindex from_tuples - for extract letters use indexing with str.

import pandas as pd

li =[u'w2011', u'c2011', u'g2011', u'i2011', u'r2011', u'w2012', u'c2012',
   u'g2012', u'i2012', u'r2012', u'w2013', u'c2013', u'g2013', u'i2013',
   u'r2013', u'w2014', u'c2014', u'g2014', u'i2014', u'r2014', u'w2015',
   u'c2015', u'g2015', u'i2015', u'r2015']

df = pd.DataFrame(range(25), index = li, columns=['a'])
print df
        a
w2011   0
c2011   1
g2011   2
i2011   3
r2011   4
w2012   5
c2012   6
g2012   7
i2012   8
r2012   9
w2013  10
c2013  11
g2013  12
i2013  13
r2013  14
w2014  15
c2014  16
g2014  17
i2014  18
r2014  19
w2015  20
c2015  21
g2015  22
i2015  23
r2015  24
print df.index.str[0]
Index([u'w', u'c', u'g', u'i', u'r', u'w', u'c', u'g', u'i', u'r', u'w', u'c',
       u'g', u'i', u'r', u'w', u'c', u'g', u'i', u'r', u'w', u'c', u'g', u'i',
       u'r'],
      dtype='object')

print df.index.str[1:]
Index([u'2011', u'2011', u'2011', u'2011', u'2011', u'2012', u'2012', u'2012',
       u'2012', u'2012', u'2013', u'2013', u'2013', u'2013', u'2013', u'2014',
       u'2014', u'2014', u'2014', u'2014', u'2015', u'2015', u'2015', u'2015',
       u'2015'],
      dtype='object')

df.index = pd.MultiIndex.from_tuples(zip(df.index.str[0], df.index.str[1:]))
print df
         a
w 2011   0
c 2011   1
g 2011   2
i 2011   3
r 2011   4
w 2012   5
c 2012   6
g 2012   7
i 2012   8
r 2012   9
w 2013  10
c 2013  11
g 2013  12
i 2013  13
r 2013  14
w 2014  15
c 2014  16
g 2014  17
i 2014  18
r 2014  19
w 2015  20
c 2015  21
g 2015  22
i 2015  23
r 2015  24

If you need convert years to int, use astype:

df.index = pd.MultiIndex.from_tuples(zip(df.index.str[0], df.index.str[1:].astype(int)))

print df.index
MultiIndex(levels=[[u'c', u'g', u'i', u'r', u'w'], [2011, 2012, 2013, 2014, 2015]],
           labels=[[4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3], [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4]])

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related