Month: December 2023

  • Reinforcement Learning – Cheat Sheet

    Policy gradient methods: Methods where we directly optimize the policy without the value function. Examples are: REINFORCE, Proximal Policy Optimization.Actor-Critic: Actor critic methods consist of one actor and one critic. Actor is the function of the agent and the critic is the value function of the environment. The goal is to have a fast and…