重组数据框

debugcn 发表于 Dev

乔乔

我有一个数据帧，当前看起来如下，并具有262800行和3列。我的数据框目前如下：

       Currency    Maturity     value
0           GBP  0.08333333  4.709456
1           GBP  0.08333333  4.713099
2           GBP  0.08333333  4.707237
3           GBP  0.08333333  4.705043
4           GBP  0.08333333  4.697150
5           GBP  0.08333333  4.710647
6           GBP  0.08333333  4.701150
7           GBP  0.08333333  4.694639
8           GBP  0.08333333  4.686111
9           GBP  0.08333333  4.714750
......
262770      GBP          25  2.432869

我希望数据框具有以下形式。我已经为此采取了一些措施，包括melt在下面的代码中使用，但是由于某些原因，它摆脱了我的Date专栏，导致了上面的数据框。我不确定如何获取日期列并获取以下数据框：

   Maturity     Date            Currency  Yield_pct
0  0.08333333   2005-01-04      GBP       4.709456              
1  0.08333333   2005-01-05      GBP       4.713099               
2  0.08333333   2005-01-06      GBP       4.707237
....
9  25           2005-01-04      GBP       2.432869

我的代码如下：

from pandas.io.excel import read_excel
import pandas as pd
import numpy as np

url = 'http://www.bankofengland.co.uk/statistics/Documents/yieldcurve/uknom05_mdaily.xls'

# check the sheet number, spot: 9/9, short end 7/9
spot_curve = read_excel(url, sheetname=8)
short_end_spot_curve = read_excel(url, sheetname=6)

# do some cleaning, keep NaN for now, as forward fill NaN is not recommended for yield curve
spot_curve.columns = spot_curve.loc['years:']
spot_curve.columns.name = 'Maturity'
valid_index = spot_curve.index[4:]
spot_curve = spot_curve.loc[valid_index]
# remove all maturities within 5 years as those are duplicated in short-end file
col_mask = spot_curve.columns.values > 5
spot_curve = spot_curve.iloc[:, col_mask]


short_end_spot_curve.columns = short_end_spot_curve.loc['years:']
short_end_spot_curve.columns.name = 'Maturity'
valid_index = short_end_spot_curve.index[4:]
short_end_spot_curve = short_end_spot_curve.loc[valid_index]

# merge these two, time index are identical
# ==============================================
combined_data = pd.concat([short_end_spot_curve, spot_curve], axis=1, join='outer')
# sort the maturity from short end to long end
combined_data.sort_index(axis=1, inplace=True)

def filter_func(group):
    return group.isnull().sum(axis=1) <= 50

combined_data = combined_data.groupby(level=0).filter(filter_func)

idx = 0
values = ['GBP'] * len(combined_data.index)
combined_data.insert(idx, 'Currency', values) 

#print combined_data.columns.values

#I had to do the melt 
combined_data = pd.melt(combined_data,id_vars=['Currency'])#Arbitrarily melted on 'Currency' as for some reason when I do print combined_data.columns.values I see that 'Currency' corresponds to 0.08333333, etc.
print combined_data

恶作剧

不能在后面添加货币标识符melt吗？

# Copy up to this stage
combined_data = combined_data.groupby(level=0).filter(filter_func)

# My code from here
combined_data.reset_index(inplace=True, drop=False)
combined_data.rename(columns={'index': 'Date'}, inplace=True)

# This line assumes you want datetime, ignore if you don't
combined_data['Date'] = pd.to_datetime(combined_data['Date'])

result = pd.melt(combined_data, id_vars=['Date'])

result['Currency'] = 'GBP'

输出 result.head()

    Date    Maturity    value   Currency
0   2005-01-04  0.08333333  4.709456    GBP
1   2005-01-05  0.08333333  4.713099    GBP
2   2005-01-06  0.08333333  4.707237    GBP
3   2005-01-07  0.08333333  4.705043    GBP
4   2005-01-10  0.08333333  4.697150    GBP

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。