I have a gradient entering layer L1 from layer L2_1 and L2_2 at the same time, I need to rescale gradient (L2_1 + L2_2)
before it enters L1 by 1/sqrt(2)
. How can I do this?
My network looks something like this:
L2_1
/ \
input -> L0 - L1 L_final
\ /
L2_2
You can divide L2_1 and L2_2 output by sqrt(2). That will rescale both activations and backprop. If you want to modify only backprop but not activations, you can use gradient replacement trick from here
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments