I have ohlc data that is contained in a 'mid' column and am not sure how to resample to keep the correct ohlc data
Here is my data ... Columns are time, mid, volume , complete
candle_data
time \
0 2021-08-20 19:43:00+00:00
1 2021-08-20 19:44:00+00:00
2 2021-08-20 19:45:00+00:00
3 2021-08-20 19:46:00+00:00
4 2021-08-20 19:47:00+00:00
5 2021-08-20 19:48:00+00:00
6 2021-08-20 19:49:00+00:00
7 2021-08-20 19:50:00+00:00
8 2021-08-20 19:51:00+00:00
9 2021-08-20 19:52:00+00:00
10 2021-08-20 19:53:00+00:00
11 2021-08-20 19:54:00+00:00
12 2021-08-20 19:55:00+00:00
13 2021-08-20 19:56:00+00:00
14 2021-08-20 19:57:00+00:00
mid volume complete
0 {'o': 20.36418, 'h': 20.36455, 'l': 20.36075, ... 68 True
1 {'o': 20.36127, 'h': 20.36134, 'l': 20.35814, ... 49 True
2 {'o': 20.35845, 'h': 20.359, 'l': 20.3558, 'c'... 164 True
3 {'o': 20.35635, 'h': 20.35768, 'l': 20.35275, ... 155 True
4 {'o': 20.35315, 'h': 20.3535, 'l': 20.353, 'c'... 69 True
5 {'o': 20.35315, 'h': 20.3563, 'l': 20.35315, '... 146 True
6 {'o': 20.35525, 'h': 20.35776, 'l': 20.35312, ... 158 True
7 {'o': 20.35338, 'h': 20.35512, 'l': 20.35237, ... 166 True
8 {'o': 20.3524, 'h': 20.35335, 'l': 20.35123, '... 85 True
9 {'o': 20.35335, 'h': 20.35365, 'l': 20.35305, ... 44 True
10 {'o': 20.35365, 'h': 20.3544, 'l': 20.35365, '... 76 True
11 {'o': 20.3541, 'h': 20.3563, 'l': 20.3541, 'c'... 92 True
12 {'o': 20.35458, 'h': 20.36225, 'l': 20.35408, ... 188 True
13 {'o': 20.361, 'h': 20.36704, 'l': 20.36085, 'c... 392 True
14 {'o': 20.36638, 'h': 20.3672, 'l': 20.36637, '... 14 False
I have converted my time into a datetime object, and I have done something like this before where you aggregate data:
df2 = df1.resample('60min', on='date')
.agg({'volume': 'sum', 'open': 'first', 'close': 'last', 'high': 'max', 'low': 'min'})
But not sure how to accomplish it with this format.
I want to do a different timeframe, eg. 5min, and:
Here is the output of .head().to_dict()
{'time': {0: Timestamp('2021-08-20 20:17:00+0000', tz='UTC'),
1: Timestamp('2021-08-20 20:18:00+0000', tz='UTC'),
2: Timestamp('2021-08-20 20:19:00+0000', tz='UTC'),
3: Timestamp('2021-08-20 20:20:00+0000', tz='UTC'),
4: Timestamp('2021-08-20 20:21:00+0000', tz='UTC')},
'mid': {0: {'o': 20.3778, 'h': 20.3778, 'l': 20.37066, 'c': 20.37066},
1: {'o': 20.37066, 'h': 20.37141, 'l': 20.37066, 'c': 20.37133},
2: {'o': 20.37133, 'h': 20.37141, 'l': 20.37113, 'c': 20.37113},
3: {'o': 20.37113, 'h': 20.37172, 'l': 20.37113, 'c': 20.37158},
4: {'o': 20.3716, 'h': 20.37165, 'l': 20.36865, 'c': 20.36865}}, 'volume': {0: 217, 1: 23, 2: 13, 3: 20, 4: 45},
'complete': {0: True, 1: True, 2: True, 3: True, 4: True}}
Any idea how to accomplish this?
Convert the nested dictionaries to their own columns and then resample. Then, convert back if needed:
df[["o", "h", "l", "c"]] = df["mid"].apply(pd.Series)
df = df.drop("mid", axis=1) \
.set_index("time") \
.resample("5min") \
.agg({"o": "first",
"h": "max",
"l": "min",
"c": "last",
"volume": "sum",
"complete": all
})
#to convert back to original structure
df = df.assign(mid=df[["o", "h", "l", "c"]].apply(dict, axis=1)).drop(["o", "h", "l", "c"], axis=1)
>>> df
time volume complete mid
2021-08-20 20:15:00+00:00 253 True {'o': 20.3778, 'h': 20.3778, 'l': 20.37066, 'c': 20.37113}
2021-08-20 20:20:00+00:00 65 True {'o': 20.37113, 'h': 20.37172, 'l': 20.36865, 'c': 20.36865}
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments