当 MultivariateNormalDiag() 的 scale_diag 是一个常量时,“ValueError:没有为任何变量提供梯度”

西莫西斯

下面是一个代码片段,给定 a stateaction从依赖于状态的分布 ( prob_policy)生成a 然后根据选择该动作的概率的 -1 倍的损失更新图的权重。在以下示例中,MultivariateNormal的均值 ( mu) 和协方差 ( sigma) 都是可训练/学习的。

import numpy as np
import tensorflow as tf
import tensorflow_probability as tfp

# make the graph
state = tf.placeholder(tf.float32, (1, 2), name="state")
mu = tf.contrib.layers.fully_connected(
    inputs=state,
    num_outputs=2,
    biases_initializer=tf.ones_initializer)
sigma = tf.contrib.layers.fully_connected(
    inputs=state,
    num_outputs=2,
    biases_initializer=tf.ones_initializer)
sigma = tf.squeeze(sigma)
mu = tf.squeeze(mu)
prob_policy = tfp.distributions.MultivariateNormalDiag(loc=mu, scale_diag=sigma)
action = prob_policy.sample()
picked_action_prob = prob_policy.prob(action)
loss = -tf.log(picked_action_prob)
optimizer = tf.train.AdamOptimizer(learning_rate=0.01)
train_op = optimizer.minimize(loss)

# run the optimizer
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    state_input = np.expand_dims([0.,0.],0)
    _, action_loss = sess.run([train_op, loss], { state: state_input })
    print(action_loss)

但是,当我替换这条线时

prob_policy = tfp.distributions.MultivariateNormalDiag(loc=mu, scale_diag=sigma)

使用以下行(并注释掉生成 sigma 层并挤压它的行)

prob_policy = tfp.distributions.MultivariateNormalDiag(loc=mu, scale_diag=[1.,1.])

我收到以下错误

ValueError: No gradients provided for any variable, check your graph for ops that do not support gradients, between variables ["<tf.Variable 'fully_connected/weights:0' shape=(2, 2) dtype=float32_ref>", "<tf.Variable 'fully_connected/biases:0' shape=(2,) dtype=float32_ref>"] and loss Tensor("Neg:0", shape=(), dtype=float32).

我不明白为什么会这样。难道它不应该仍然能够根据mu层中的权重采用梯度吗?为什么使分布的协方差成为常数突然使其不可微?

系统详情:

  • 张量流 1.13.1
  • Tensorflow 概率 0.6.0
  • 蟒蛇 3.6.8
  • macOS 10.13.6
布赖恩·巴顿

我们在 MVNDiag(以及 TransformedDistribution 的其他子类)内部进行的一些缓存导致了一个问题,以实现可逆性。

如果您+ 0在 .sample() 之后执行(作为一种解决方法),则渐变将起作用。

另外我建议使用dist.log_prob(..)而不是tf.log(dist.prob(..)). 更好的数字。

import numpy as np
import tensorflow as tf
import tensorflow_probability as tfp

# make the graph
state = tf.placeholder(tf.float32, (1, 2), name="state")
mu = tf.contrib.layers.fully_connected(
    inputs=state,
    num_outputs=2,
    biases_initializer=tf.ones_initializer)
sigma = tf.contrib.layers.fully_connected(
    inputs=state,
    num_outputs=2,
    biases_initializer=tf.ones_initializer)
sigma = tf.squeeze(sigma)
mu = tf.squeeze(mu)
prob_policy = tfp.distributions.MultivariateNormalDiag(loc=mu, scale_diag=[1.,1.])
action = prob_policy.sample() + 0
loss = -prob_policy.log_prob(action)
optimizer = tf.train.AdamOptimizer(learning_rate=0.01)
train_op = optimizer.minimize(loss)

# run the optimizer
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    state_input = np.expand_dims([0.,0.],0)
    _, action_loss = sess.run([train_op, loss], { state: state_input })
    print(action_loss)

本文收集自互联网,转载请注明来源。

如有侵权,请联系[email protected] 删除。

编辑于
0

我来说两句

0条评论
登录后参与评论

相关文章

Related 相关文章

热门标签

归档