我在Pandas中有以下DataFrame,我想检查HH值是否大于上一行的High值,如果更大,则更新前几行的HH值并将当前HH替换为Nonvalue。
请注意,我不想移动一列中的所有数据(因此我不认为使用shift是解决方案),我只是想根据上一行的“高”数据来更改一个特定的数据
关于计划:
我正在尝试创建一个程序,以查找指定金融市场的最小值和最大值,并且我正在使用“ peakdetect”库https://pypi.org/project/peakdetect/?
它只是生成Minima和Maxima的2D列表:
density = 2
# Temp ref to the array of minima and maxima
high_arr = peakdetect(y_axis =
clean_dataframe['High'],x_axis=clean_dataframe.index,lookahead=density)
low_arr = peakdetect(y_axis =
clean_dataframe['Low'],x_axis=clean_dataframe.index,lookahead=density)
# first index is always for maxima
_hh = pd.DataFrame(high_arr[0])
_hh = _hh.rename(columns={0:'Index',1:'HH'})
# second index is always for minima
_ll = pd.DataFrame(low_arr[1])
_ll = _ll.rename(columns={0:'Index',1:'LL'})
# join all minima and maxima to the
full_df=
clean_dataframe.join(_hh.set_index('Index')).join(_ll.set_index('Index'))
'''
clear_dataframe结果:
问题是某些LL(谷)不准确,有时上一行的低价是正确的LL,因此我必须按照图中所述测量和更改LL行。
为了帮助您了解shift(-1)的工作原理,请查看以下解决方案。我查看了图像并创建了原始DataFrame。
import pandas as pd
import numpy as np
df = pd.DataFrame({'Dates':['2021-02-04 19:00:00','2021-02-04 20:00:00',
'2021-02-04 21:00:00','2021-02-04 22:00:00',
'2021-02-04 23:00:00','2021-02-05 00:00:00',
'2021-02-05 01:00:00','2021-02-05 02:00:00'],
'Close':[1.19661,1.19660,1.19611,1.19643,1.19664,
1.19692,1.19662,1.19542],
'High' :[1.19679,1.19678,1.19680,1.19679,1.19688,
1.19721,1.19694,1.19682],
'Low' :[1.19577,1.19637,1.19604,1.19590,1.19632,
1.19634,1.19622,1.19537],
'Open' :[1.19630,1.19662,1.19665,1.19613,1.19646,
1.19662,1.19690,1.19665],
'Status':['ok']*8,
'Volume':[2579,1858,1399,788,1437,2435,2898,2641],
'HH' :[np.NaN]*5+[1.19721]+[np.NaN]*2,
'LL' :[np.NaN]*8})
print (df)
#make a copy of df['High'] into df'NewHigh']
df['NewHigh'] = df['High']
#if next row in 'HH' is greater than 'High', then update 'NewHigh' with next row from 'HH'
df.loc[df['HH'].shift(-1) > df['High'],'NewHigh'] = df['HH'].shift(-1)
print (df[['Dates','High','HH','NewHigh']])
其输出将是:
Dates High HH NewHigh
0 2021-02-04 19:00:00 1.19679 NaN 1.19679
1 2021-02-04 20:00:00 1.19678 NaN 1.19678
2 2021-02-04 21:00:00 1.19680 NaN 1.19680
3 2021-02-04 22:00:00 1.19679 NaN 1.19679
4 2021-02-04 23:00:00 1.19688 NaN 1.19721 # <- This got updated
5 2021-02-05 00:00:00 1.19721 1.19721 1.19721
6 2021-02-05 01:00:00 1.19694 NaN 1.19694
7 2021-02-05 02:00:00 1.19682 NaN 1.19682
注意:我创建了一个新列来显示更改。您可以直接更新High
。除了'NewHigh'
在df.loc行上,您还可以输入“ High”。这应该够了吧。
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句