Time Series using numpy or pandas

user1913171

I'm a beginner of Python related environment and I have problem with using time series data.

The below is my OHLC 1 minute data.

2011-11-01,9:00:00,248.50,248.95,248.20,248.70
2011-11-01,9:01:00,248.70,249.00,248.65,248.85
2011-11-01,9:02:00,248.90,249.25,248.70,249.15
...
2011-11-01,15:03:00,250.25,250.30,250.05,250.15
2011-11-01,15:04:00,250.15,250.60,250.10,250.60
2011-11-01,15:15:00,250.55,250.55,250.55,250.55
2011-11-02,9:00:00,245.55,246.25,245.40,245.80
2011-11-02,9:01:00,245.85,246.40,245.75,246.35
2011-11-02,9:02:00,246.30,246.45,245.75,245.80
2011-11-02,9:03:00,245.75,245.85,245.30,245.35
...
  1. I'd like to extract the last "CLOSE" data per each row and convert data format like the following:

    2011-11-01, 248.70, 248.85, 249.15, ... 250.15, 250.60, 250.55
    2011-11-02, 245.80, 246.35, 245.80, ...
    ...
    
  2. I'd like to calculate the highest Close value and it's time(minute) per EACH DAY like the following:

    2011-11-01, 10:23:03, 250.55
    2011-11-02, 11:02:36, 251.00
    ....
    

Any help would be very appreciated.

Thank you in advance,

Viktor Kerkez

You can use the pandas library. In the case of your data you can get the max as:

import pandas as pd
# Read in the data and parse the first two columns as a
# date-time and set it as index
df = pd.read_csv('your_file', parse_dates=[[0,1]], index_col=0, header=None)
# get only the fifth column (close)
df = df[[5]]
# Resample to date frequency and get the max value for each day.
df.resample('D', how='max')

If you want to show also the times, keep them in your DataFrame as a column and pass a function that will determine the max close value and return that row:

>>> df = pd.read_csv('your_file', parse_dates=[[0,1]], index_col=0, header=None,
                     usecols=[0, 1, 5], names=['d', 't', 'close'])
>>> df['time'] = df.index
>>> df.resample('D', how=lambda group: group.iloc[group['close'].argmax()])
             close                time
d_t                             
2011-11-01  250.60 2011-11-01 15:04:00
2011-11-02  246.35 2011-11-02 09:01:00

And if you wan't a list of the prices per day then just do a groupby per day and return the list of all the prices from every group using the apply on the grouped object:

>>> df.groupby(lambda dt: dt.date()).apply(lambda group: list(group['close']))
2011-11-01    [248.7, 248.85, 249.15, 250.15, 250.6, 250.55]
2011-11-02                    [245.8, 246.35, 245.8, 245.35]

For more information take a look at the docs: Time Series

Update for the concrete data set:

The problem with your data set is that you have some days without any data, so the function passed in as the resampler should handle those cases:

def func(group):
    if len(group) == 0:
        return None
    return group.iloc[group['close'].argmax()]
df.resample('D', how=func).dropna()

이 기사는 인터넷에서 수집됩니다. 재 인쇄 할 때 출처를 알려주십시오.

침해가 발생한 경우 연락 주시기 바랍니다[email protected] 삭제

에서 수정
0

몇 마디 만하겠습니다

0리뷰
로그인참여 후 검토

관련 기사

분류에서Dev

Resample Time Series in pandas

분류에서Dev

Reassign pandas series values using nested defaultdict

분류에서Dev

How to plot beautifully the segmentation of time series (pandas dataframe)

분류에서Dev

Group data into time dependent sets using numpy

분류에서Dev

Pandas groupby using agg and apply at the same time

분류에서Dev

for loop in irregular time series

분류에서Dev

R: time series with value

분류에서Dev

Convert pandas series into integers

분류에서Dev

Remove hours from time series

분류에서Dev

Subsequent time-series matching

분류에서Dev

Expand a time series by specified time lengths in R

분류에서Dev

Pandas - Extracting data from Series

분류에서Dev

Pandas series - recording numerical changes

분류에서Dev

pandas.Series를 dtype = np.float64를 사용하여 numpy.array로 변환 할 수 없습니다.

분류에서Dev

Time Series Prediction with Keras - Error in Model Values

분류에서Dev

Plotly: How to plot time series in Dash Plotly

분류에서Dev

Aggregation of time-series data on multiple columns

분류에서Dev

issues plotting multivariate time series in R

분류에서Dev

Selecting regular intervals from time series

분류에서Dev

Combining time series data into a single data frame

분류에서Dev

format a time series as dataframe with julian date

분류에서Dev

TIme series data in R, problems with dates

분류에서Dev

pandas.core.series.Series에 제목 추가

분류에서Dev

linear interpolate 15 Hz time series to match with 25 Hz time series in R

분류에서Dev

Update pandas DataFrame Multilevel Index with a Series?

분류에서Dev

pandas series.gt 사용법

분류에서Dev

Matching ID between two pandas series

분류에서Dev

Pandas 또는 numpy에 적절한 datetime.time 유형이없는 이유는 무엇입니까?

분류에서Dev

Adding vertical line to Date formatted time-series in matplotlib

Related 관련 기사

  1. 1

    Resample Time Series in pandas

  2. 2

    Reassign pandas series values using nested defaultdict

  3. 3

    How to plot beautifully the segmentation of time series (pandas dataframe)

  4. 4

    Group data into time dependent sets using numpy

  5. 5

    Pandas groupby using agg and apply at the same time

  6. 6

    for loop in irregular time series

  7. 7

    R: time series with value

  8. 8

    Convert pandas series into integers

  9. 9

    Remove hours from time series

  10. 10

    Subsequent time-series matching

  11. 11

    Expand a time series by specified time lengths in R

  12. 12

    Pandas - Extracting data from Series

  13. 13

    Pandas series - recording numerical changes

  14. 14

    pandas.Series를 dtype = np.float64를 사용하여 numpy.array로 변환 할 수 없습니다.

  15. 15

    Time Series Prediction with Keras - Error in Model Values

  16. 16

    Plotly: How to plot time series in Dash Plotly

  17. 17

    Aggregation of time-series data on multiple columns

  18. 18

    issues plotting multivariate time series in R

  19. 19

    Selecting regular intervals from time series

  20. 20

    Combining time series data into a single data frame

  21. 21

    format a time series as dataframe with julian date

  22. 22

    TIme series data in R, problems with dates

  23. 23

    pandas.core.series.Series에 제목 추가

  24. 24

    linear interpolate 15 Hz time series to match with 25 Hz time series in R

  25. 25

    Update pandas DataFrame Multilevel Index with a Series?

  26. 26

    pandas series.gt 사용법

  27. 27

    Matching ID between two pandas series

  28. 28

    Pandas 또는 numpy에 적절한 datetime.time 유형이없는 이유는 무엇입니까?

  29. 29

    Adding vertical line to Date formatted time-series in matplotlib

뜨겁다태그

보관