The code:
import tensorflow as tf
A = tf.constant([[0.1,0.2,0.3,0.4],[0.2,0.1,0.4,0.3],[0.4,0.3,0.2,0.1],[0.3,0.2,0.1,0.4],[0.1,0.4,0.3,0.2]], dtype=tf.float32)
B = tf.constant([1, 2, 1, 3, 3], dtype=tf.int32)
w_1 = tf.constant(value=[1,1,1,1,1], dtype=tf.float32)
w_2 = tf.constant(value=[1,2,3,4,5], dtype=tf.float32)
D = tf.contrib.legacy_seq2seq.sequence_loss_by_example([A], [B], [w_1])
D_1 = tf.contrib.legacy_seq2seq.sequence_loss_by_example([A], [B], [w_1], average_across_timesteps=False)
D_2 = tf.contrib.legacy_seq2seq.sequence_loss_by_example([A], [B], [w_2])
D_3 = tf.contrib.legacy_seq2seq.sequence_loss_by_example([A], [B], [w_2], average_across_timesteps=False)
with tf.Session() as sess:
print(sess.run(D))
print(sess.run(D_1))
print(sess.run(D_2))
print(sess.run(D_3))
And the result is:
[1.4425355 1.2425355 1.3425356 1.2425356 1.4425356]
[1.4425355 1.2425355 1.3425356 1.2425356 1.4425356]
[1.4425355 1.2425355 1.3425356 1.2425356 1.4425356]
[1.4425355 2.485071 4.027607 4.9701424 7.212678 ]
I don't understand why the result is the same no matter if the param average_across_timesteps
is set as 'True' or 'False'.
Here's the source code that performs the averaging:
if average_across_timesteps:
total_size = math_ops.add_n(weights)
total_size += 1e-12 # Just to avoid division by 0 for all-0 weights.
log_perps /= total_size
In your case, the weights
is a list of one tensor, either w_1
or w_2
, i.e., you have one time step. In both cases, tf.add_n(weights)
doesn't change it, because it's a sum of one element (not the sum of elements in w_1
or w_2
).
This explains the result: D
and D_1
are evaluated to the same arrays, because D_1 = D * w_1
(element-wise). D_2
and D_3
are different because w_2
contains not only ones.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments