Scientia potentia est

Notes on Training Deep Learning Models

—

by

in Uncategorized

1. Nan Values During Training

Maybe the gradients are too big, and they are exploding
If there is a logarithmic function, gradients near zero is problematic
I had this issue during training of MADDPG, I trained the critic first, and then policy network (not at the same time), and it started working.
Check learning rate, sometimes you may be using a wrong one due to a mistake in the code.

2. Policy does not get better

Check the exploration strategy, maybe you are not exploring properly. For example, for continuous actions spaces, you can explore by adding a noise to the policy net output (like in DDGP paper).

Comments

Leave a Reply Cancel reply