Keras 中的有状态自动编码器

debugcn 发表于 Dev

伯里什

我正在尝试创建一个有状态的自动编码器模型。目标是使自动编码器对每个时间序列都有状态。数据由 10 个时间序列组成，每个时间序列有 567 个长度。

timeseries#1: 451, 318, 404, 199, 225, 158, 357, 298, 339, 155, 135, 239, 306, ....
timeseries#2: 304, 274, 150, 143, 391, 357, 278, 557, 98, 106, 305, 288, 325, ....
...
timeseries#10: 208, 138, 201, 342, 280, 282, 280, 140, 124, 261, 193, .....

我的回溯窗口是 28。所以我用 28 个时间步生成了以下序列：

[451, 318, 404, 199, 225, 158, 357, 298, 339, 155, 135, 239, 306, .... ]
[318, 404, 199, 225, 158, 357, 298, 339, 155, 135, 239, 306, 56, ....]
[404, 199, 225, 158, 357, 298, 339, 155, 135, 239, 306, 56, 890, ....]
...
[304, 274, 150, 143, 391, 357, 278, 557, 98, 106, 305, 288, 325, ....]
[274, 150, 143, 391, 357, 278, 557, 98, 106, 305, 288, 325, 127, ....]
[150, 143, 391, 357, 278, 557, 98, 106, 305, 288, 325, 127, 798, ....]
...
[208, 138, 201, 342, 280, 282, 280, 140, 124, 261, 193, .....]
[138, 201, 342, 280, 282, 280, 140, 124, 261, 193, 854, .....]

这为每个时间序列提供了 539 个序列。我需要做的是让 LSTM 对每个时间序列都是有状态的，并在看到时间序列中的所有序列后重置状态。这是我的代码：

batch_size = 35  #(total Number of samples is 5390, and it is dividable by 35)
timesteps = 28
n_features = 1
hunits = 14
RepeatVector(timesteps/hunits = 2)
epochs = 1000


inputEncoder = Input(batch_shape=(35, 28, 1), name='inputEncoder')
outEncoder, c, h = LSTM(14, stateful=True, return_state=True, name='outputEncoder')(inputEncoder)
encoder_model = Model(inputEncoder, outEncoder)

context = RepeatVector(2, name='inputDecoder')(outEncoder)
context_reshaped = Reshape(28, 1), name='ReshapeLayer')(context)

outDecoder = LSTM(1, return_sequences=True, stateful=True, name='decoderLSTM')(context_reshaped)

autoencoder = Model(inputEncoder, outDecoder)

autoencoder.compile(loss='mse', optimizer='rmsprop')

for i in range(epochs):
       history = autoencoder.fit(data, data,
                          validation_split=config['validation_split_ratio'],
                          shuffle=False,
                          batch_size=35,
                          epochs=1,
                         )   
       autoencoder.reset_states()

2个问题：

1-第一个纪元完成后我收到此错误，我想知道它是怎么发生的：

ValueError: Cannot feed value of shape (6, 28, 1) for Tensor u'inputEncoder:0', which has shape '(35, 28, 1)'

2- 我不认为那个模型能如我所愿。在这里，它将在所有批次（一个时期）后重置状态，这意味着在处理完所有时间序列之后。我应该如何在时间序列之间将其更改为有状态？

伯里什

问题出在validation_split汇率上！！它被设置为 0.33%，当分裂发生时，它尝试训练 3611 个数据样本，这些样本不能被我的batch_size=35. 根据这篇文章，我可以找到正确的数字，从那篇文章中复制：

def quantize_validation_split(validation_split, sample_count, batch_size):
    batch_count = sample_count / batch_size
    return float(int(batch_count * validation_split)) / batch_count
然后你可以打电话model.fit(..., validation_split=fix_validation_split(0.05, len(X), batch_size))。但如果 keras 在 fit() 中为你做这件事会很酷。

另外，关于使自动编码器按照我需要的方式有状态：reset_state在每个时代结束时不应该有一个！

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。