我有一个数据帧,当前看起来如下,并具有262800行和3列。我的数据框目前如下:
Currency Maturity value
0 GBP 0.08333333 4.709456
1 GBP 0.08333333 4.713099
2 GBP 0.08333333 4.707237
3 GBP 0.08333333 4.705043
4 GBP 0.08333333 4.697150
5 GBP 0.08333333 4.710647
6 GBP 0.08333333 4.701150
7 GBP 0.08333333 4.694639
8 GBP 0.08333333 4.686111
9 GBP 0.08333333 4.714750
......
262770 GBP 25 2.432869
我希望数据框具有以下形式。我已经为此采取了一些措施,包括melt
在下面的代码中使用,但是由于某些原因,它摆脱了我的Date
专栏,导致了上面的数据框。我不确定如何获取日期列并获取以下数据框:
Maturity Date Currency Yield_pct
0 0.08333333 2005-01-04 GBP 4.709456
1 0.08333333 2005-01-05 GBP 4.713099
2 0.08333333 2005-01-06 GBP 4.707237
....
9 25 2005-01-04 GBP 2.432869
我的代码如下:
from pandas.io.excel import read_excel
import pandas as pd
import numpy as np
url = 'http://www.bankofengland.co.uk/statistics/Documents/yieldcurve/uknom05_mdaily.xls'
# check the sheet number, spot: 9/9, short end 7/9
spot_curve = read_excel(url, sheetname=8)
short_end_spot_curve = read_excel(url, sheetname=6)
# do some cleaning, keep NaN for now, as forward fill NaN is not recommended for yield curve
spot_curve.columns = spot_curve.loc['years:']
spot_curve.columns.name = 'Maturity'
valid_index = spot_curve.index[4:]
spot_curve = spot_curve.loc[valid_index]
# remove all maturities within 5 years as those are duplicated in short-end file
col_mask = spot_curve.columns.values > 5
spot_curve = spot_curve.iloc[:, col_mask]
short_end_spot_curve.columns = short_end_spot_curve.loc['years:']
short_end_spot_curve.columns.name = 'Maturity'
valid_index = short_end_spot_curve.index[4:]
short_end_spot_curve = short_end_spot_curve.loc[valid_index]
# merge these two, time index are identical
# ==============================================
combined_data = pd.concat([short_end_spot_curve, spot_curve], axis=1, join='outer')
# sort the maturity from short end to long end
combined_data.sort_index(axis=1, inplace=True)
def filter_func(group):
return group.isnull().sum(axis=1) <= 50
combined_data = combined_data.groupby(level=0).filter(filter_func)
idx = 0
values = ['GBP'] * len(combined_data.index)
combined_data.insert(idx, 'Currency', values)
#print combined_data.columns.values
#I had to do the melt
combined_data = pd.melt(combined_data,id_vars=['Currency'])#Arbitrarily melted on 'Currency' as for some reason when I do print combined_data.columns.values I see that 'Currency' corresponds to 0.08333333, etc.
print combined_data
不能在后面添加货币标识符melt
吗?
# Copy up to this stage
combined_data = combined_data.groupby(level=0).filter(filter_func)
# My code from here
combined_data.reset_index(inplace=True, drop=False)
combined_data.rename(columns={'index': 'Date'}, inplace=True)
# This line assumes you want datetime, ignore if you don't
combined_data['Date'] = pd.to_datetime(combined_data['Date'])
result = pd.melt(combined_data, id_vars=['Date'])
result['Currency'] = 'GBP'
输出 result.head()
Date Maturity value Currency
0 2005-01-04 0.08333333 4.709456 GBP
1 2005-01-05 0.08333333 4.713099 GBP
2 2005-01-06 0.08333333 4.707237 GBP
3 2005-01-07 0.08333333 4.705043 GBP
4 2005-01-10 0.08333333 4.697150 GBP
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句