How to create input samples from pandas dataframe for a LSTM model?

Stian Hafslund

I'm trying to create a LSTM model that gives me binary output buy or not. I have data that is in the format of: [date_time, close, volume], in millions of rows. I'm stuck at formatting the data as 3-D; Samples, Timesteps, features.

I have used pandas to read the data. I want to format it so I can get 4000 samples with 400 timesteps each, and two features (close and volume). Can someone advise on how to do this?

EDIT: I am using the TimeseriesGenerator as advised, but I am not sure how to check my sequences and replace the output Y with my own binary buy output.

df = normalize_data(df)

print("Creating sequences for NN \n")
targets = df.drop('date_time', 1)
train = keras.preprocessing.sequence.TimeseriesGenerator(df, targets, 1, sampling_rate=1, stride=1,
                                                         start_index=0, end_index=int(len(df.index)*0.8),
                                                         shuffle=True, reverse=False, batch_size=time_steps)

This is running without error, but now the output is the first close value after input timeseries.

EDIT 2: So thus far my code looks like this:

df = data.normalize_data(df)
targets = df.iloc[:, 3]  # Buy signal target

df.drop('y1', axis=1, inplace=True)
df.drop('y2', axis=1, inplace=True)

train = TimeseriesGenerator(df, targets, length=1, sampling_rate=1, stride=1,
                            start_index=0, end_index=int(len(df.index) * 0.8),
                            shuffle=True, reverse=False, batch_size=time_steps)

# number of samples
print("Samples: " + str(len(train)))
x, y = train[0]
print(str(x))

The output is as follows:

Samples: 8
Traceback (most recent call last):
File "/home/stian/.local/lib/python3.6/site- 
packages/pandas/core/indexes/base.py", line 3078, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 140, in 
pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: range(418, 419)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "./main.py", line 94, in <module>
data_menu()
File "./main.py", line 42, in data_menu
data_menu()
File "./main.py", line 56, in data_menu
nn_menu()
File "./main.py", line 76, in nn_menu
nn.nn_gen(pre_processed_data)
File "/home/stian/git/stian9k/nn.py", line 33, in nn_gen
x, y = train[0]
File "/home/stian/.local/lib/python3.6/site-packages/keras_preprocessing/sequence.py", line 378, in __getitem__
samples[j] = self.data[indices]
File "/home/stian/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 2688, in __getitem__
return self._getitem_column(key)
File "/home/stian/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 2695, in _getitem_column
return self._get_item_cache(key)
File "/home/stian/.local/lib/python3.6/site-packages/pandas/core/generic.py", line 2489, in _get_item_cache
values = self._data.get(item)
File "/home/stian/.local/lib/python3.6/site-packages/pandas/core/internals.py", line 4115, in get
loc = self.items.get_loc(item)
File "/home/stian/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: range(418, 419)

So it seems that even tough I am getting 8 objects from the generator I am not able to look them up. If I test the type: print(str(type(train))) I get TimeseriesGenerator object. Any advise is much appreciated again.

EDIT 3: it turns out timeseriesgenerator did not like pandas dataframes. The issue was resolved by converting to numpy array as well as converting pandas timestamp type to float.

today

You can simply use Keras TimeseriesGenerator for this purpose. You can easily set the length (i.e. number of timesteps in each sample), sampling rate and stride to sub-sample the data.

It would return an instance of Sequence class which you can then pass to fit_generator to fit the model on the data generated by it. I highly recommend to read the documentation for more info about this class, its arguments and its usage.

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]

編集
0

コメントを追加

0

関連記事

分類Dev

How to create a Pandas DataFrame from a list of OrderedDicts?

分類Dev

How to remove repeated samples from a time series in Pandas?

分類Dev

How to create a pandas dataframe with a column as array

分類Dev

How to create new values in a pandas dataframe column based on values from another column

分類Dev

how to extract rows of dataframe from user input

分類Dev

Create a excel from a dataframe if user puts certain input

分類Dev

TensorFlow different input and output shapes for stateful LSTM model

分類Dev

How to create a dataframe using a function based on user-input?

分類Dev

How to convert from Pandas' DatetimeIndex to DataFrame in PySpark?

分類Dev

How to stay with a percent of data from a pandas DataFrame?

分類Dev

pandas groupby create a new dataframe with label from apply operation

分類Dev

Pandas create dataframe from a dictionary of 2D arrays

分類Dev

how to replace invalid datatype input to "None" in pandas dataframe

分類Dev

How to pass schema to create a new Dataframe from existing Dataframe?

分類Dev

How to create pandas matrix from one column

分類Dev

How to update a pandas dataframe with sets, from another dataframe

分類Dev

Python: How to create a step plot with offline plotly for a pandas DataFrame?

分類Dev

How can I dynamically create '&' filters of varying length for pandas DataFrame

分類Dev

How to create new date and insert as index in pandas dataframe?

分類Dev

How to do groupby max to create new columns in pandas dataframe

分類Dev

How to speed up Pandas apply function to create a new column in the dataframe?

分類Dev

How to create specific DataFrame based on other df in Pandas?

分類Dev

How to create Model Classes(NSObject) automatically from json dictionary?

分類Dev

How to create an Ext.data.Model from a generic object?

分類Dev

How to find the average of EACH samples out of a 100 samples I took from a large data set

分類Dev

How to get percentage from how filled a pandas dataframe column is?

分類Dev

Python Pandas: Create DataFrame Fast

分類Dev

Create subcolumns in pandas dataframe python

分類Dev

How do I fit the model of two concatenate LSTM in keras?

Related 関連記事

  1. 1

    How to create a Pandas DataFrame from a list of OrderedDicts?

  2. 2

    How to remove repeated samples from a time series in Pandas?

  3. 3

    How to create a pandas dataframe with a column as array

  4. 4

    How to create new values in a pandas dataframe column based on values from another column

  5. 5

    how to extract rows of dataframe from user input

  6. 6

    Create a excel from a dataframe if user puts certain input

  7. 7

    TensorFlow different input and output shapes for stateful LSTM model

  8. 8

    How to create a dataframe using a function based on user-input?

  9. 9

    How to convert from Pandas' DatetimeIndex to DataFrame in PySpark?

  10. 10

    How to stay with a percent of data from a pandas DataFrame?

  11. 11

    pandas groupby create a new dataframe with label from apply operation

  12. 12

    Pandas create dataframe from a dictionary of 2D arrays

  13. 13

    how to replace invalid datatype input to "None" in pandas dataframe

  14. 14

    How to pass schema to create a new Dataframe from existing Dataframe?

  15. 15

    How to create pandas matrix from one column

  16. 16

    How to update a pandas dataframe with sets, from another dataframe

  17. 17

    Python: How to create a step plot with offline plotly for a pandas DataFrame?

  18. 18

    How can I dynamically create '&' filters of varying length for pandas DataFrame

  19. 19

    How to create new date and insert as index in pandas dataframe?

  20. 20

    How to do groupby max to create new columns in pandas dataframe

  21. 21

    How to speed up Pandas apply function to create a new column in the dataframe?

  22. 22

    How to create specific DataFrame based on other df in Pandas?

  23. 23

    How to create Model Classes(NSObject) automatically from json dictionary?

  24. 24

    How to create an Ext.data.Model from a generic object?

  25. 25

    How to find the average of EACH samples out of a 100 samples I took from a large data set

  26. 26

    How to get percentage from how filled a pandas dataframe column is?

  27. 27

    Python Pandas: Create DataFrame Fast

  28. 28

    Create subcolumns in pandas dataframe python

  29. 29

    How do I fit the model of two concatenate LSTM in keras?

ホットタグ

アーカイブ