RNN中的过度拟合用于标普预测

古斯塔沃·费利佩·奥利维拉(Gustavo Felipe Oliveira)

我正在根据Udemy的Deep Learning AZ课程中的模型进行RNN。对于Google股票的示例,我们使用5年的每日股票价格。在讲座的最后,据说要用更多的数据进行测试或更改RNN的参数或结构。我的想法是,如果我可以获得更多数据,则RNN可以获得更好的结果。我从2006年1月1日起从S&P下载了数据,直到今天,分开了火车测试,除了最后23天和23天是我的预测测试。很高兴看到我是否可以得到一种有用的信息...让它运行100个纪元。

Epoch 1/100
3599/3599 [==============================] - 235s 65ms/step - loss: 0.0090
Epoch 2/100
3599/3599 [==============================] - 210s 58ms/step - loss: 0.0024
Epoch 3/100
3599/3599 [==============================] - 208s 58ms/step - loss: 0.0022
Epoch 4/100
3599/3599 [==============================] - 557s 155ms/step - loss: 0.0024
Epoch 5/100
3599/3599 [==============================] - 211s 59ms/step - loss: 0.0022
Epoch 6/100
3599/3599 [==============================] - 207s 58ms/step - loss: 0.0018
Epoch 7/100
3599/3599 [==============================] - 216s 60ms/step - loss: 0.0018
Epoch 8/100
3599/3599 [==============================] - 265s 74ms/step - loss: 0.0016
Epoch 9/100
3599/3599 [==============================] - 215s 60ms/step - loss: 0.0016
Epoch 10/100
3599/3599 [==============================] - 209s 58ms/step - loss: 0.0014
Epoch 11/100
3599/3599 [==============================] - 217s 60ms/step - loss: 0.0014
Epoch 12/100
3599/3599 [==============================] - 216s 60ms/step - loss: 0.0013
Epoch 13/100
3599/3599 [==============================] - 218s 60ms/step - loss: 0.0012
Epoch 14/100
3599/3599 [==============================] - 217s 60ms/step - loss: 0.0012
Epoch 15/100
3599/3599 [==============================] - 210s 58ms/step - loss: 0.0012
Epoch 16/100
3599/3599 [==============================] - 292s 81ms/step - loss: 0.0012
Epoch 17/100
3599/3599 [==============================] - 328s 91ms/step - loss: 0.0011
Epoch 18/100
3599/3599 [==============================] - 199s 55ms/step - loss: 9.8658e-04
Epoch 19/100
3599/3599 [==============================] - 199s 55ms/step - loss: 0.0010
Epoch 20/100
3599/3599 [==============================] - 286s 79ms/step - loss: 9.9106e-04

WOW 0,0010很好...但是从这里开始太低了。我停在了39个时代...因为时间太长了,损失也太小了。

Epoch 39/100
2560/3599 [====================>.........] - ETA: 1:00 - **loss: 6.3598e-04**

这是结果

我是否过度拟合了数据?还是过早停止是造成大错误的原因?我该如何优化运行100个纪元所需的时间?

代码如下:

  # Recurrent Neural Network



# Part 1 - Data Preprocessing

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout

# Importing the training set
dataset_train = pd.read_csv('S&P_Train.csv')
training_set = dataset_train.iloc[:, 1:2].values

# Feature Scaling

sc = MinMaxScaler(feature_range =  [0, 1])
training_set_sc = sc.fit_transform(training_set)

# Creating a data structure with 60 timesteps and 1 output
X_train = []
y_train = []
for i in range(60, 3659):
    X_train.append(training_set_sc[i-60:i, 0])
    y_train.append(training_set_sc[i, 0])
X_train, y_train = np.array(X_train), np.array(y_train)


# Reshaping
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))



# Part 2 - Building the RNN

# Importing the Keras libraries and packages


# Initialising the RNN
regressor = Sequential()


# Adding the first LSTM layer and some Dropout regularisation
rnn = regressor.add(LSTM(units = 50, return_sequences = True, input_shape = (X_train.shape[1], 1)))
rnn = regressor.add(Dropout(0.2))

# Adding a second LSTM layer and some Dropout regularisation
rnn = regressor.add(LSTM(units = 50, return_sequences = True))
rnn = regressor.add(Dropout(0.2))

# Adding a third LSTM layer and some Dropout regularisation
rnn = regressor.add(LSTM(units = 50, return_sequences = True))
rnn = regressor.add(Dropout(0.2))

# Adding a fourth LSTM layer and some Dropout regularisation
rnn = regressor.add(LSTM(units = 50))
rnn = regressor.add(Dropout(0.2))

# Adding the output layer
rnn = regressor.add(Dense(units = 1))

# Compiling the RNN

rnn = regressor.compile(optimizer = 'Adam', loss = 'mean_squared_error')
# Fitting the RNN to the Training set
regressor.fit(X_train, y_train, epochs = 100, batch_size = 32)



# Part 3 - Making the predictions and visualising the results
print('ok')

# Getting the real stock price of 2017
dataset_test = pd.read_csv('S&P_Test.csv')
real_stock_price = dataset_test.iloc[:, 1:2].values

# Getting the predicted stock price of 2017
dataset_total = pd.concat((dataset_train['Open'], dataset_test['Open']), axis = 0)
inputs = dataset_total[len(dataset_total) - len(dataset_test) - 60:].values
inputs = inputs.reshape(-1, 1)
inputs = sc.transform(inputs)
X_test = []
for i in range(60, 83):
    X_test.append(inputs[i-60:i, 0])
X_test = np.array(X_test)
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))
predicted_stock_price = regressor.predict(X_test)
predicted_stock_price = sc.inverse_transform(predicted_stock_price)

# Visualising the results
plt.plot(real_stock_price, color = 'red', label = 'Real Stock Price')
plt.plot(predicted_stock_price, color = 'blue', label = 'Predicted Stock Price')
plt.title('Prediction of Stocks Values')
plt.xlabel('time')
plt.ylabel('Stock Price')
plt.legend()
plt.show()
塞科默

我是否过度拟合了数据?

是的,您可能确实做了,您可以通过val_loss进行检查,如果您的验证损失开始增加,则说明您过度拟合。您应该使用validation_set并检查validation_error

我该如何优化运行100个纪元所需的时间?

您可以在通过Tensorflow API的Earlystoping过度拟合数据之前停止训练, tf.keras.callbacks.EarlyStopping()

from tensorflow.keras.callbacks import EarlyStopping

early_stopping = EarlyStopping()

model.compile(...)
model.fit(..., epochs = 9999, callbacks=early_stopping)

本文收集自互联网,转载请注明来源。

如有侵权,请联系[email protected] 删除。

编辑于
0

我来说两句

0条评论
登录后参与评论

相关文章

来自分类Dev

Python中的曲线拟合用于外推、回归分析

来自分类Dev

在训练和拟合 RNN 序列模型后进行预测

来自分类Dev

如何避免在给定的 Convnet 中过度拟合

来自分类Dev

LIBSVM过度拟合

来自分类Dev

随机森林过度拟合

来自分类Dev

LSTM实施/过度拟合

来自分类Dev

LIBSVM过度拟合

来自分类Dev

无法预测R的GLMNET软件包中的拟合模型

来自分类Dev

统计模型中预测值与拟合值之间的差异

来自分类Dev

调整参数时,在交叉验证的SVM中识别过度拟合

来自分类Dev

张量流/ keras神经网络中的过度拟合和数据泄漏

来自分类Dev

次优早期停止可防止机器学习中的过度拟合?

来自分类Dev

RNN优于DNN的预测

来自分类Dev

如何测试分类器的过度拟合?

来自分类Dev

防止过度拟合的机器学习是作弊吗?

来自分类Dev

如何避免遗传算法过度拟合

来自分类Dev

Keras:过度拟合的Conv2D

来自分类Dev

MaxPooling会减少过度拟合吗?

来自分类Dev

如何使用Keras过度拟合数据?

来自分类Dev

分批培训会导致过度拟合

来自分类Dev

用curve.fit过度拟合

来自分类Dev

学习曲线是否显示过度拟合?

来自分类Dev

支持向量机过度拟合我的数据

来自分类Dev

预测曲线拟合matlab

来自分类Dev

预测 gmnl 回归的拟合概率

来自分类Dev

将data.table apply()中的rollapply()和weighted.mean()组合用于多列

来自分类Dev

在单个图形中绘制具有重叠预测间隔的nls拟合

来自分类Dev

修剪集合用于计算距离

来自分类Dev

SwiftUI与AVPlayer结合用于.onReceive通知