从当前行提取5分钟的值

林恩

我有以下DataFrame

import pandas as pd, numpy as np
pd.DataFrame({'Instrument':['A2M','A2M','A2M','A2M','A2M','A2M','A2M','A2M','A2M','A2M'], 
'Date':['2020-01-02','2020-01-02','2020-01-02','2020-01-02', '2020-01-02','2020-01-02','2020-01-02','2020-01-02','2020-01-02','2020-01-02'],
'Time':['09:59:50.024','09:59:50.623','10:00:15.124','10:00:15.148','10:00:42.552','10:04:49.782','10:05:14.896','10:05:41.719','10:05:45.050','10:08:19.170'], 
'midquote': [0.0,0.0,14.145,14.145,14.245,14.360,14.305,14.235,14.295,14.285]})

我想在每行的时间之后5分钟提取中间引号。不可能获得精确的5分钟差异,因此最好使用最接近5分钟(必须小于)的时间。

所需的输出

pd.DataFrame({'Instrument':['A2M','A2M','A2M','A2M','A2M','A2M','A2M','A2M','A2M','A2M'], 
'Date':['2020-01-02','2020-01-02','2020-01-02','2020-01-02', '2020-01-02','2020-01-02','2020-01-02','2020-01-02','2020-01-02','2020-01-02'],
'Time':['09:59:50.024','09:59:50.623','10:00:15.124','10:00:15.148','10:00:42.552','10:04:49.782','10:05:14.896','10:05:41.719','10:05:45.050','10:08:19.170'], 
'midquote': [0.0,0.0,14.145,14.145,14.245,14.360,14.305,14.235,14.295,14.285], 
'timestamp':['2020-01-02 09:59:50.024000','2020-01-02 09:59:50.623000','2020-01-02 10:00:15.124000','2020-01-02 10:00:15.148000','2020-01-02 10:00:42.552000','2020-01-02 10:04:49.782000','2020-01-02 10:05:14.896000','2020-01-02 10:05:41.719000','2020-01-02 10:05:45.050000','2020-01-02 10:08:19.170000'],
'timestamp_5_min':['2020-01-02 10:04:49.782000','2020-01-02 10:04:49.782000','2020-01-02 10:05:14.896000','2020-01-02 10:05:14.896000','2020-01-02 10:05:41.719000','nan','nan','nan','nan','nan'],
 'midquote_5_min':[14.360,14.305,14.235,14.295,14.285,'nan','nan','nan','nan','nan']})

我尝试了以下代码,但没有用

df['timestamp'] = pd.to_datetime(df.Date + ' ' + df.Time)
df.sort_values(by=['timestamp'], inplace=True)
df['midquote'].fillna(0, inplace=True)
mid = df[['Instrument','timestamp','midquote']]
mid['5_min_timestamp'] = mid.timestamp
mid.rename(columns={'midquote':'5_min_midquote'}, inplace=True)
pd.merge_asof(df, mid, on='timestamp', by=['Instrument'], tolerance=pd.Timedelta("5min"), direction='forward', allow_exact_matches=False)

另外,如果我没有这样做sort_values,它将给出一个错误left keys must be sorted

请帮忙。

谢谢。

戴维·埃里克森

问题的症结在于,该方法.searchsorted用来返回该列Time与该列之间的最近日期/时间的索引+ 1,Time+ 5分钟timestamp_5_min从那里,我进行了一些混乱的合并,列重命名和列删除操作,这些操作可能更干净,但可以完成工作。

输入:

import pandas as pd, numpy as np, datetime as dt

df = pd.DataFrame({'Instrument':['A2M','A2M','A2M','A2M','A2M','A2M','A2M','A2M','A2M','A2M'], 
'Date':['2020-01-02','2020-01-02','2020-01-02','2020-01-02', '2020-01-02','2020-01-02','2020-01-02','2020-01-02','2020-01-02','2020-01-02'],
'Time':['09:59:50.024','09:59:50.623','10:00:15.124','10:00:15.148','10:00:42.552','10:04:49.782','10:05:14.896','10:05:41.719','10:05:45.050','10:08:19.170'], 
'midquote': [0.0,0.0,14.145,14.145,14.245,14.360,14.305,14.235,14.295,14.285]})

码:

df['Date'] = pd.to_datetime(df['Date'])
df['Time'] = pd.to_datetime(df['Time'])
df['timestamp_5_min'] = pd.to_datetime(df['Time'] + pd.Timedelta(minutes=5))
df['timestamp_5_min_new'] = df['Time'].searchsorted(df['timestamp_5_min'])-1
df1 = df[['Time', 'midquote']]
df2 = pd.merge(df,df1,how='left',left_on='timestamp_5_min_new', right_index=True)
df2.loc[df2['timestamp_5_min_new'] == max(df2['timestamp_5_min_new']), ['timestamp_5_min_new', 'Time_y', 'midquote_y']] = np.nan
df2 = df2.drop(['timestamp_5_min_new','timestamp_5_min', 'midquote_x'], axis=1).rename({'Time_x': 'Time', 'Time_y': 'timestamp_5_min', 'midquote_y' : 'midquote'}, axis=1)
df2

输出:

    Instrument Date Time                    timestamp_5_min         midquote
0   A2M 2020-01-02  2020-07-16 09:59:50.024 2020-07-16 10:04:49.782 14.360
1   A2M 2020-01-02  2020-07-16 09:59:50.623 2020-07-16 10:04:49.782 14.360
2   A2M 2020-01-02  2020-07-16 10:00:15.124 2020-07-16 10:05:14.896 14.305
3   A2M 2020-01-02  2020-07-16 10:00:15.148 2020-07-16 10:05:14.896 14.305
4   A2M 2020-01-02  2020-07-16 10:00:42.552 2020-07-16 10:05:41.719 14.235
5   A2M 2020-01-02  2020-07-16 10:04:49.782 NaT                     NaN
6   A2M 2020-01-02  2020-07-16 10:05:14.896 NaT                     NaN
7   A2M 2020-01-02  2020-07-16 10:05:41.719 NaT                     NaN
8   A2M 2020-01-02  2020-07-16 10:05:45.050 NaT                     NaN
9   A2M 2020-01-02  2020-07-16 10:08:19.170 NaT                     NaN

用仪器分组回答#2。这使它更具挑战性,但是下面完成了工作。

我已经更改了一些示例数据,因此现在有两个类别,并且我将时差要求更改为25秒。

步骤#1-一起使用.groupby以找到最接近本练习(传递给p的参数的最近时间值的索引.transform 'lambda xsearchsorted25 secondspd.Timedelta

步骤#2-我a用来创建字典的那行实际上是为了按组查找最大索引,其最大索引Instrument分别是3和9。

步骤#3-从那里创建一个临时文件,显示每个仪器的最大索引以及最大索引大于该timestamp_5_min_new列的任何行从本质上讲,这意味着该组之后的25秒之内没有行,因此NaNnp.where()语句中将其转到了行从那里,只需清理列和列名。

输入:

import pandas as pd, numpy as np, datetime as dt
df = pd.DataFrame({'Instrument':
                   ['B3M','B3M','B3M','B3M','A2M','A2M','A2M','A2M','A2M','A2M'],
                   'Date':['2020-01-02','2020-01-02','2020-01-02','2020-01-02', '2020-01-02',
                           '2020-01-02','2020-01-02','2020-01-02','2020-01-02','2020-01-02'],
                   'Time':['09:59:50.024','09:59:50.623','10:00:15.124','10:00:15.148','10:00:42.552',
                           '10:04:49.782','10:05:14.896','10:05:41.719','10:05:45.050','10:08:19.170'],
                   'midquote': [0.0,0.0,14.145,14.145,14.245,14.360,14.305,14.235,14.295,14.285]})
df['Date'] = pd.to_datetime(df['Date'])
df['Time'] = pd.to_datetime(df['Time'])

码:

df['timestamp_5_min'] = pd.to_datetime(df['Time'] + pd.Timedelta(seconds=25))
df['timestamp_5_min_new'] = df.groupby(['Instrument'])['Time'].transform(lambda x: x).searchsorted(df.groupby(['Instrument'])['timestamp_5_min'].transform(lambda x: x))
a = {v: k for k, v in df['Instrument'].drop_duplicates(keep='last').to_dict().items()}
df['temp'] = df['Instrument'].map(a)
df1 = df[['Time', 'midquote']]
df2 = pd.merge(df,df1,how='left',left_on='timestamp_5_min_new', right_index=True)
df2['Time_y'] = np.where((df2['timestamp_5_min_new'] > df2['temp']), np.datetime64('NaT'), df2['timestamp_5_min'])
df2['midquote_y'] = np.where((df2['timestamp_5_min_new'] > df2['temp']), np.nan, df2['midquote_y'])
df2 = df2.drop(['timestamp_5_min_new','timestamp_5_min', 'midquote_x', 'temp'], axis=1).rename({'Time_x': 'Time', 'Time_y': 'timestamp_5_min', 'midquote_y' : 'midquote'}, axis=1)
df2

输出:

    Instrument  Date    Time                    timestamp_5_min       midquote
0   B3M     2020-01-02  2020-07-17 09:59:50.024 2020-07-17 10:00:15.024 14.145
1   B3M     2020-01-02  2020-07-17 09:59:50.623 NaT                 NaN
2   B3M     2020-01-02  2020-07-17 10:00:15.124 NaT                 NaN
3   B3M     2020-01-02  2020-07-17 10:00:15.148 NaT                 NaN
4   A2M     2020-01-02  2020-07-17 10:00:42.552 2020-07-17 10:01:07.552 14.360
5   A2M     2020-01-02  2020-07-17 10:04:49.782 2020-07-17 10:05:14.782 14.305
6   A2M     2020-01-02  2020-07-17 10:05:14.896 2020-07-17 10:05:39.896 14.235
7   A2M     2020-01-02  2020-07-17 10:05:41.719 2020-07-17 10:06:06.719 14.285
8   A2M     2020-01-02  2020-07-17 10:05:45.050 2020-07-17 10:06:10.050 14.285
9   A2M     2020-01-02  2020-07-17 10:08:19.170 NaT                 NaN

本文收集自互联网,转载请注明来源。

如有侵权,请联系[email protected] 删除。

编辑于
0

我来说两句

0条评论
登录后参与评论

相关文章

来自分类Dev

如果当前行值> 10并且上一行<10,我想在接下来的30分钟内输出MAX(值)

来自分类Dev

ksh-从当前时间减去5分钟

来自分类Dev

从日期值中减去5分钟

来自分类Dev

如何每隔一分钟将2加到当前值?

来自分类Dev

从当前日期和时间中减去5分钟javascript

来自分类Dev

提取最后5分钟的数据并添加该列,并为失败的输出编号

来自分类Dev

在5分钟内安装Roo

来自分类Dev

仅运行setInterval 5分钟?

来自分类Dev

bash到5分钟

来自分类Dev

Javascript:向上舍入5分钟

来自分类Dev

Time()通过5分钟选择

来自分类Dev

在5分钟内安装Roo

来自分类Dev

5分钟后无声音

来自分类Dev

如果前后1分钟或2分钟,如何舍入到最接近的5分钟间隔?

来自分类Dev

显示当前时间前30分钟

来自分类Dev

日期:获取当前的15分钟间隔

来自分类Dev

如何从五分钟前的时间减去当前时间?

来自分类Dev

并行运行两个scala函数,5分钟后返回最新值

来自分类Dev

PHP:检查MySQL时间值是否大于5分钟

来自分类Dev

PHP:检查MySQL时间值是否大于5分钟

来自分类Dev

如何在5分钟到30分钟之间汇总数据,同时保持日期间隔和其他值

来自分类Dev

从日志文件中提取最后10分钟

来自分类Dev

将1分钟数据汇总为5分钟平均数据

来自分类Dev

bash一分钟到5分钟

来自分类Dev

常量脚本->更改src iframe(1分钟,5分钟)+ jQuery

来自分类Dev

SQL将5分钟的3行分组为15分钟的单个行

来自分类Dev

如何分别将时间列分为5分钟间隔和最大值/最小值SQL?

来自分类Dev

PHP为时间值增加15分钟

来自分类Dev

将时间值增加15分钟