Pandas to_datetime with multiindex

debugcn 投稿 Dev

crayxt

how can I drop a level in multi-indexed columns when converting three columns to datetime? Below example only contains three columns while in my dateframe there are more columns, of course, and those other columns use two level names.

    >>> import pandas as pd
    >>> df = pd.DataFrame([[2010, 1, 2],[2011,1,3],[2012,2,3]])
    >>> df.columns = [['year', 'month', 'day'],['y', 'm', 'd']]
    >>> print(df)
       year month day
          y     m   d
    0  2010     1   2
    1  2011     1   3
    2  2012     2   3
    >>> pd.to_datetime(df[['year', 'month', 'day']])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/site-packages/pandas/core/tools/datetimes.py", line 512, in to_datetime
    result = _assemble_from_unit_mappings(arg, errors=errors)
  File "/usr/lib64/python2.7/site-packages/pandas/core/tools/datetimes.py", line 582, in _assemble_from_unit_mappings
    unit = {k: f(k) for k in arg.keys()}
  File "/usr/lib64/python2.7/site-packages/pandas/core/tools/datetimes.py", line 582, in <dictcomp>
    unit = {k: f(k) for k in arg.keys()}
  File "/usr/lib64/python2.7/site-packages/pandas/core/tools/datetimes.py", line 577, in f
    if value.lower() in _unit_map:
AttributeError: 'tuple' object has no attribute 'lower'

Edit: Add more columns to explain better:

>>> df = pd.DataFrame([[2010, 1, 2, 10, 2],[2011,1,3,11,3],[2012,2,3,12,2]])
>>> df.columns = [['year', 'month', 'day', 'temp', 'wind_speed'],['', '', '', 'degc','m/s']]
>>> print(df)
   year month day temp wind_speed
                  degc        m/s
0  2010     1   2   10          2
1  2011     1   3   11          3
2  2012     2   3   12          2

What I need is to combine first three columns to datetime index, leaving two last columns with data.

jezrael

Use droplevel for remove second level:

df.columns = df.columns.droplevel(1)
df = pd.to_datetime(df[['year', 'month', 'day']])
print (df)
0   2010-01-02
1   2011-01-03
2   2012-02-03
dtype: datetime64[ns]

If only 3 columns:

df.columns = df.columns.droplevel(1)
df = pd.to_datetime(df)
print (df)

0   2010-01-02
1   2011-01-03
2   2012-02-03
dtype: datetime64[ns]

If more columns:

df = pd.DataFrame([[2010, 1, 2,3],[2011,1,3,5],[2012,2,3,7]])
df.columns = [['year', 'month', 'day','a'],['y', 'm', 'd', 'b']]
print(df)
   year month day  a
      y     m   d  b
0  2010     1   2  3
1  2011     1   3  5
2  2012     2   3  7

#select datetime columns only
df1 = df[['year', 'month', 'day']]
df1.columns = df1.columns.droplevel(1)
print (df1)
   year  month  day
0  2010      1    2
1  2011      1    3
2  2012      2    3

#convert to Series
s1 = pd.to_datetime(df1)
#set new MultiIndex 
s1.name=('date','dat')
print (s1)
0   2010-01-02
1   2011-01-03
2   2012-02-03
Name: (date, dat), dtype: datetime64[ns]

#remove original columns and add new datetime Series
df = df.drop(['year', 'month', 'day'], axis=1, level=0).join(s1)
print (df)
   a       date
   b        dat
0  3 2010-01-02
1  5 2011-01-03
2  7 2012-02-03

Another solution with transpose, should be slowier in big DataFrame:

df1 = df[['year', 'month', 'day']]
s1 =  pd.to_datetime(df1.T.reset_index(drop=True, level=1).T).rename(('date', 'dat'))
print (s1)
0   2010-01-02
1   2011-01-03
2   2012-02-03
Name: (date, dat), dtype: datetime64[ns]

df1 = df.join(s1)
print (df1)
   year month day temp wind_speed       date
                  degc        m/s        dat
0  2010     1   2   10          2 2010-01-02
1  2011     1   3   11          3 2011-01-03
2  2012     2   3   12          2 2012-02-03

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]