how can I drop a level in multi-indexed columns when converting three columns to datetime? Below example only contains three columns while in my dateframe there are more columns, of course, and those other columns use two level names.
>>> import pandas as pd
>>> df = pd.DataFrame([[2010, 1, 2],[2011,1,3],[2012,2,3]])
>>> df.columns = [['year', 'month', 'day'],['y', 'm', 'd']]
>>> print(df)
year month day
y m d
0 2010 1 2
1 2011 1 3
2 2012 2 3
>>> pd.to_datetime(df[['year', 'month', 'day']])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.7/site-packages/pandas/core/tools/datetimes.py", line 512, in to_datetime
result = _assemble_from_unit_mappings(arg, errors=errors)
File "/usr/lib64/python2.7/site-packages/pandas/core/tools/datetimes.py", line 582, in _assemble_from_unit_mappings
unit = {k: f(k) for k in arg.keys()}
File "/usr/lib64/python2.7/site-packages/pandas/core/tools/datetimes.py", line 582, in <dictcomp>
unit = {k: f(k) for k in arg.keys()}
File "/usr/lib64/python2.7/site-packages/pandas/core/tools/datetimes.py", line 577, in f
if value.lower() in _unit_map:
AttributeError: 'tuple' object has no attribute 'lower'
Edit: Add more columns to explain better:
>>> df = pd.DataFrame([[2010, 1, 2, 10, 2],[2011,1,3,11,3],[2012,2,3,12,2]])
>>> df.columns = [['year', 'month', 'day', 'temp', 'wind_speed'],['', '', '', 'degc','m/s']]
>>> print(df)
year month day temp wind_speed
degc m/s
0 2010 1 2 10 2
1 2011 1 3 11 3
2 2012 2 3 12 2
What I need is to combine first three columns to datetime index, leaving two last columns with data.
Use droplevel
for remove second level:
df.columns = df.columns.droplevel(1)
df = pd.to_datetime(df[['year', 'month', 'day']])
print (df)
0 2010-01-02
1 2011-01-03
2 2012-02-03
dtype: datetime64[ns]
If only 3 columns
:
df.columns = df.columns.droplevel(1)
df = pd.to_datetime(df)
print (df)
0 2010-01-02
1 2011-01-03
2 2012-02-03
dtype: datetime64[ns]
If more columns:
df = pd.DataFrame([[2010, 1, 2,3],[2011,1,3,5],[2012,2,3,7]])
df.columns = [['year', 'month', 'day','a'],['y', 'm', 'd', 'b']]
print(df)
year month day a
y m d b
0 2010 1 2 3
1 2011 1 3 5
2 2012 2 3 7
#select datetime columns only
df1 = df[['year', 'month', 'day']]
df1.columns = df1.columns.droplevel(1)
print (df1)
year month day
0 2010 1 2
1 2011 1 3
2 2012 2 3
#convert to Series
s1 = pd.to_datetime(df1)
#set new MultiIndex
s1.name=('date','dat')
print (s1)
0 2010-01-02
1 2011-01-03
2 2012-02-03
Name: (date, dat), dtype: datetime64[ns]
#remove original columns and add new datetime Series
df = df.drop(['year', 'month', 'day'], axis=1, level=0).join(s1)
print (df)
a date
b dat
0 3 2010-01-02
1 5 2011-01-03
2 7 2012-02-03
Another solution with transpose, should be slowier in big DataFrame:
df1 = df[['year', 'month', 'day']]
s1 = pd.to_datetime(df1.T.reset_index(drop=True, level=1).T).rename(('date', 'dat'))
print (s1)
0 2010-01-02
1 2011-01-03
2 2012-02-03
Name: (date, dat), dtype: datetime64[ns]
df1 = df.join(s1)
print (df1)
year month day temp wind_speed date
degc m/s dat
0 2010 1 2 10 2 2010-01-02
1 2011 1 3 11 3 2011-01-03
2 2012 2 3 12 2 2012-02-03
この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。
侵害の場合は、連絡してください[email protected]
コメントを追加