pandas resample nested ohlc data

user1601716 Published at Dev

user1601716

I have ohlc data that is contained in a 'mid' column and am not sure how to resample to keep the correct ohlc data

Here is my data ... Columns are time, mid, volume , complete

candle_data
                        time  \
0  2021-08-20 19:43:00+00:00   
1  2021-08-20 19:44:00+00:00   
2  2021-08-20 19:45:00+00:00   
3  2021-08-20 19:46:00+00:00   
4  2021-08-20 19:47:00+00:00   
5  2021-08-20 19:48:00+00:00   
6  2021-08-20 19:49:00+00:00   
7  2021-08-20 19:50:00+00:00   
8  2021-08-20 19:51:00+00:00   
9  2021-08-20 19:52:00+00:00   
10 2021-08-20 19:53:00+00:00   
11 2021-08-20 19:54:00+00:00   
12 2021-08-20 19:55:00+00:00   
13 2021-08-20 19:56:00+00:00   
14 2021-08-20 19:57:00+00:00   
                                                  mid  volume  complete  
0   {'o': 20.36418, 'h': 20.36455, 'l': 20.36075, ...      68      True  
1   {'o': 20.36127, 'h': 20.36134, 'l': 20.35814, ...      49      True  
2   {'o': 20.35845, 'h': 20.359, 'l': 20.3558, 'c'...     164      True  
3   {'o': 20.35635, 'h': 20.35768, 'l': 20.35275, ...     155      True  
4   {'o': 20.35315, 'h': 20.3535, 'l': 20.353, 'c'...      69      True  
5   {'o': 20.35315, 'h': 20.3563, 'l': 20.35315, '...     146      True  
6   {'o': 20.35525, 'h': 20.35776, 'l': 20.35312, ...     158      True  
7   {'o': 20.35338, 'h': 20.35512, 'l': 20.35237, ...     166      True  
8   {'o': 20.3524, 'h': 20.35335, 'l': 20.35123, '...      85      True  
9   {'o': 20.35335, 'h': 20.35365, 'l': 20.35305, ...      44      True  
10  {'o': 20.35365, 'h': 20.3544, 'l': 20.35365, '...      76      True  
11  {'o': 20.3541, 'h': 20.3563, 'l': 20.3541, 'c'...      92      True  
12  {'o': 20.35458, 'h': 20.36225, 'l': 20.35408, ...     188      True  
13  {'o': 20.361, 'h': 20.36704, 'l': 20.36085, 'c...     392      True  
14  {'o': 20.36638, 'h': 20.3672, 'l': 20.36637, '...      14     False

I have converted my time into a datetime object, and I have done something like this before where you aggregate data:

df2 = df1.resample('60min', on='date')
         .agg({'volume': 'sum', 'open': 'first', 'close': 'last', 'high': 'max', 'low': 'min'})

But not sure how to accomplish it with this format.
I want to do a different timeframe, eg. 5min, and:

sum the volume
open is first candle
close is last
high is max
low is min
complete is 1 if all are 1, but 0 if any are 0

Here is the output of .head().to_dict()

{'time': {0: Timestamp('2021-08-20 20:17:00+0000', tz='UTC'), 
          1: Timestamp('2021-08-20 20:18:00+0000', tz='UTC'), 
          2: Timestamp('2021-08-20 20:19:00+0000', tz='UTC'), 
          3: Timestamp('2021-08-20 20:20:00+0000', tz='UTC'), 
          4: Timestamp('2021-08-20 20:21:00+0000', tz='UTC')}, 
'mid': {0: {'o': 20.3778, 'h': 20.3778, 'l': 20.37066, 'c': 20.37066}, 
        1: {'o': 20.37066, 'h': 20.37141, 'l': 20.37066, 'c': 20.37133}, 
        2: {'o': 20.37133, 'h': 20.37141, 'l': 20.37113, 'c': 20.37113},  
        3: {'o': 20.37113, 'h': 20.37172, 'l': 20.37113, 'c': 20.37158}, 
        4: {'o': 20.3716, 'h': 20.37165, 'l': 20.36865, 'c': 20.36865}}, 'volume': {0: 217, 1: 23, 2: 13, 3: 20, 4: 45}, 
'complete': {0: True, 1: True, 2: True, 3: True, 4: True}}

Any idea how to accomplish this?

not_speshal

Convert the nested dictionaries to their own columns and then resample. Then, convert back if needed:

df[["o", "h", "l", "c"]] = df["mid"].apply(pd.Series)
df = df.drop("mid", axis=1) \
       .set_index("time") \
       .resample("5min") \
       .agg({"o": "first", 
             "h": "max", 
             "l": "min", 
             "c": "last", 
             "volume": "sum",
             "complete": all
             })

#to convert back to original structure
df = df.assign(mid=df[["o", "h", "l", "c"]].apply(dict, axis=1)).drop(["o", "h", "l", "c"], axis=1)

Output:

>>> df
                     time   volume  complete    mid
2021-08-20 20:15:00+00:00      253      True    {'o': 20.3778, 'h': 20.3778, 'l': 20.37066, 'c': 20.37113}
2021-08-20 20:20:00+00:00       65      True    {'o': 20.37113, 'h': 20.37172, 'l': 20.36865, 'c': 20.36865}

Collected from the Internet

Please contact [email protected] to delete if infringement.