Notes on Training Deep Learning Models

1. Nan Values During Training

  1. Maybe the gradients are too big, and they are exploding
  2. If there is a logarithmic function, gradients near zero is problematic
  3. I had this issue during training of MADDPG, I trained the critic first, and then policy network (not at the same time), and it started working.
  4. Check learning rate, sometimes you may be using a wrong one due to a mistake in the code.

2. Policy does not get better

  1. Check the exploration strategy, maybe you are not exploring properly. For example, for continuous actions spaces, you can explore by adding a noise to the policy net output (like in DDGP paper).

Leave a Reply

Your email address will not be published. Required fields are marked *