Scikit learn split train test for series

Aviade

I have a data which include dates in sorted order.

I would like to split the given data to train and test set. However, I must to split the data in a way that the test have to be newer than the train set.

Please look at the given example:

Let's assume that we have data by dates:

1, 2, 3, ..., n.

The numbers from 1 to n represents the days.

I would like to split it to 20% from the data to be train set and 80% of the data to be test set.

Good results:

1) train set = 1, 2, 3, ..., 20

   test set = 21, ..., 100


2) train set = 101, 102, ... 120

    test set = 121, ... 200

My code:

train_size = 0.2
train_dataframe, test_dataframe = cross_validation.train_test_split(features_dataframe, train_size=train_size)                          

train_dataframe = train_dataframe.sort(["date"])
test_dataframe = test_dataframe.sort(["date"])

Does not work for me!

Any suggestions?

piRSquared

If you insist that all testing data be newer than all training data, then there is only one way to accomplish the desired 20/80 split.

n = features_dataframe.shape[0]
train_size = 0.2

features_dataframe = features_dataframe.sort_values('date')
train_dataframe = features_dataframe.iloc[:int(n * train_size)]
test_dataframe = features_dataframe.iloc[int(n * train_size):]

And there is nothing random about it.

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]

編集
0

コメントを追加

0

関連記事

分類Dev

Scikit-learn train_test_splitとインデックス

分類Dev

Pandas and scikit-learn - train_test_split dimensions of X, y

分類Dev

time series split train and test 80%

分類Dev

メソッド "train_test_split"(scikit Learn)のパラメーター "stratify"

分類Dev

scikit learn:train_test_split、異なるデータセットで同じ分割を保証できますか

分類Dev

"Unexpected keyword argument 'axis'" tunning Sckit-Learn's train_test_split function after using Pandas' cut function

分類Dev

Split image dataset into train-test datasets

分類Dev

split dataset into train and test using tensorflow

分類Dev

Cross-validation metrics in scikit-learn for each data split

分類Dev

Wrapper for test_train_split to produce train, validation, and test splits for any number of input arrays

分類Dev

train_test_split not removing y train and test variables after index slicing

分類Dev

Error plotting scikit-learn dataset training and test data

分類Dev

KFolds Cross Validation vs train_test_split

分類Dev

train_test_splitの実装中のValueError

分類Dev

train_test_split関数の変更

分類Dev

train_test_splitのkwargsの設定

分類Dev

scikit learn test_data_split:ValueError:サンプル数に一貫性のない入力変数が見つかりました:[4999,5000]

分類Dev

scikit learn documentation in PDF

分類Dev

train_test_splitを使用した後、100%の分類器の精度

分類Dev

複数の機能を備えたtrain_test_split

分類Dev

train_test_splitがデータを分割しない

分類Dev

sklearnのtrain_test_splitのrandom_stateパラメーター

分類Dev

train_test_split-ランダムなし、元の順序

分類Dev

train_test_split sklearnpythonにシードを設定する

分類Dev

train_test_splitとStratifiedShuffleSplitの引数を階層化する

分類Dev

How to split data into test and train after applying stratified k-fold cross validation?

分類Dev

scikit-learnの交差検定:(X_test、y_test)の平均絶対誤差

分類Dev

Scikit-Learn Standard Scaler

分類Dev

repeated FeatureUnion in scikit-learn

Related 関連記事

  1. 1

    Scikit-learn train_test_splitとインデックス

  2. 2

    Pandas and scikit-learn - train_test_split dimensions of X, y

  3. 3

    time series split train and test 80%

  4. 4

    メソッド "train_test_split"(scikit Learn)のパラメーター "stratify"

  5. 5

    scikit learn:train_test_split、異なるデータセットで同じ分割を保証できますか

  6. 6

    "Unexpected keyword argument 'axis'" tunning Sckit-Learn's train_test_split function after using Pandas' cut function

  7. 7

    Split image dataset into train-test datasets

  8. 8

    split dataset into train and test using tensorflow

  9. 9

    Cross-validation metrics in scikit-learn for each data split

  10. 10

    Wrapper for test_train_split to produce train, validation, and test splits for any number of input arrays

  11. 11

    train_test_split not removing y train and test variables after index slicing

  12. 12

    Error plotting scikit-learn dataset training and test data

  13. 13

    KFolds Cross Validation vs train_test_split

  14. 14

    train_test_splitの実装中のValueError

  15. 15

    train_test_split関数の変更

  16. 16

    train_test_splitのkwargsの設定

  17. 17

    scikit learn test_data_split:ValueError:サンプル数に一貫性のない入力変数が見つかりました:[4999,5000]

  18. 18

    scikit learn documentation in PDF

  19. 19

    train_test_splitを使用した後、100%の分類器の精度

  20. 20

    複数の機能を備えたtrain_test_split

  21. 21

    train_test_splitがデータを分割しない

  22. 22

    sklearnのtrain_test_splitのrandom_stateパラメーター

  23. 23

    train_test_split-ランダムなし、元の順序

  24. 24

    train_test_split sklearnpythonにシードを設定する

  25. 25

    train_test_splitとStratifiedShuffleSplitの引数を階層化する

  26. 26

    How to split data into test and train after applying stratified k-fold cross validation?

  27. 27

    scikit-learnの交差検定:(X_test、y_test)の平均絶対誤差

  28. 28

    Scikit-Learn Standard Scaler

  29. 29

    repeated FeatureUnion in scikit-learn

ホットタグ

アーカイブ