Tensorflow：CNN 训练在零向量处收敛

debugcn 发表于 Dev

哈珀龙

我是深度学习的初学者，并参加了一些关于 Udacity 的课程。最近我正在尝试构建一个深度网络来检测输入深度图像中的手部关节，但似乎效果不佳。（我的数据集是ICVL Hand Posture Dataset）网络结构如图所示。

① 一批输入图片，240x320；

② 一个8通道的卷积层，内核为5x5；

③ 一个最大池化层，ksize = stride = 2；

④ 一个全连接层，weight.shape = [38400, 1024]；

⑤ 一个全连接层，weight.shape = [1024, 48]。

经过几次训练后，最后一层的输出收敛为 (0, 0, ..., 0) 向量。我选择均方误差作为损失函数，它的值保持在 40000 以上并且似乎没有减少。

网络结构已经太简单了，无法再次简化，但问题仍然存在。任何人都可以提供任何建议吗？

我的主要代码发布如下：

image = tf.placeholder(tf.float32, [None, 240, 320, 1])
annotations = tf.placeholder(tf.float32, [None, 48])

W_convolution_layer1 = tf.Variable(tf.truncated_normal([5, 5, 1, 8], stddev=0.1))
b_convolution_layer1 = tf.Variable(tf.constant(0.1, shape=[8]))
h_convolution_layer1 = tf.nn.relu(
    tf.nn.conv2d(image, W_convolution_layer1, [1, 1, 1, 1], 'SAME') + b_convolution_layer1)
h_pooling_layer1 = tf.nn.max_pool(h_convolution_layer1, [1, 2, 2, 1], [1, 2, 2, 1], 'SAME')

W_fully_connected_layer1 = tf.Variable(tf.truncated_normal([120 * 160 * 8, 1024], stddev=0.1))
b_fully_connected_layer1 = tf.Variable(tf.constant(0.1, shape=[1024]))
h_pooling_flat = tf.reshape(h_pooling_layer1, [-1, 120 * 160 * 8])
h_fully_connected_layer1 = tf.nn.relu(
    tf.matmul(h_pooling_flat, W_fully_connected_layer1) + b_fully_connected_layer1)

W_fully_connected_layer2 = tf.Variable(tf.truncated_normal([1024, 48], stddev=0.1))
b_fully_connected_layer2 = tf.Variable(tf.constant(0.1, shape=[48]))
detection = tf.nn.relu(
    tf.matmul(h_fully_connected_layer1, W_fully_connected_layer2) + b_fully_connected_layer2)

mean_squared_error = tf.reduce_sum(tf.losses.mean_squared_error(annotations, detection))
training = tf.train.AdamOptimizer(1e-4).minimize(mean_squared_error)
# This data loader reads images and annotations and convert them into batches of numbers.
loader = ICVLDataLoader('../data/')

with tf.Session() as session:
    session.run(tf.global_variables_initializer())
    for i in range(1000):
        # batch_images: a list with shape = [BATCH_SIZE, 240, 320, 1]
        # batch_annotations: a list with shape = [BATCH_SIZE, 48]
        [batch_images, batch_annotations] = loader.get_batch(100).to_1d_list()
        [x_, t_, l_, p_] = session.run([x_image, training, mean_squared_error, detection],
                                       feed_dict={images: batch_images, annotations: batch_annotations})

它运行像这样。

x 0

主要问题可能relu是输出层中的激活。您应该删除它，即让detection简单地成为矩阵乘法的结果。如果您想强制输出为正数，请考虑使用指数函数之类的东西。

虽然relu是一种流行的隐藏激活，但我发现将其用作输出激活的一个主要问题：众所周知，relu将负输入映射为 0——然而，至关重要的是，梯度也将为 0。这发生在输出层基本上意味着当您的网络产生小于 0 的输出时（这很可能在随机初始化时发生），您的网络无法从错误中学习。这可能会严重影响整个学习过程。

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。

编辑于2021-07-18

我来说两句

0条评论

登录后参与评论

来自分类Dev

Related 相关文章

文章

Tensorflow：CNN 训练在零向量处收敛

Tensorflow：CNN 训练在零向量处收敛

使用预训练的（Tensorflow）CNN提取特征

如何使用Tensorflow在CNN中训练图像

训练后测试 tensorflow cnn 模型

Tensorflow：如何从预先训练的CNN的特定层提取图像特征？

在训练CNN时输入大量的NaN中的Tensorflow熵

如何使用Tensorflow数据集进行CNN模型训练

如何使用Tensorflow数据集进行CNN模型训练

Tensorflow：确定预训练的CNN模型的输出步幅

Tensorflow预训练的CNN：预测图像的相同类别

在训练CNN时输入大量的NaN中的Tensorflow熵

将图像输入到已经训练好的TensorFlow CNN中

每次使用 TensorFlow 训练 CNN（MNIST 数据集）时，如何获得相同的损失值？

在TensorFlow中CNN的训练过程中如何打印每个epoch的准确率？

错误：Tensorflow CNN尺寸

在tensorflow 2中从零开始训练keras应用

使用预训练vgg19 tensorflow，Keras在CNN自动编码器中定义自定义损失（感知损失）

在 tensorflow 中创建训练和测试向量时出错

Tensorflow收敛但预测错误

XOR Tensorflow不收敛

CNN的负面训练图片范例

Tensorflow尺寸在CNN中不兼容

在Windows中使用创建的Tensorflow CNN模型？

使用Tensorflow使用CNN实现光流

给定训练数据的CNN的理想输入大小

生成用于训练CNN的“人工”图像

如何训练端到端的CNN？

训练CNN后准确性低

使用预训练的VGG的多流CNN

CNN训练的模型似乎不起作用