object oriented
: all of the rl agents using the same framework(base classAgent
), makes it easy to read and understandperfect reproduction
: training result would be exactly the same under the same random seed
- DQN Playing Atari with Deep Reinforcement Learning, Human-level control through deep reinforcement learning
- DDQN Deep Reinforcement Learning with Double Q-learning
- Dueling DQN Dueling Network Architectures for Deep Reinforcement Learning
- DDQN with prioritized experience replay Prioritized Experience Replay
- REINFORCE(Monte-Carlo Policy Gradient, Vanilla Policy Gradient)
- REINFORCE with BASELINE
- DDPG Continuous control with deep reinforcement learning
- TD3 Addressing Function Approximation Error in Actor-Critic Methods
- SAC Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , Soft Actor-Critic Algorithms and Applications
- PPO Proximal Policy Optimization Algorithms
- A3C Asynchronous Methods for Deep Reinforcement Learning
training result of the agent trying to solve a problem from a scratch
the original environment is hard to converge,
so I modify the reward to solve this problem and get the result below
Note that there is no goal for Pendulum-v0, but as you can see in the result, the agent did learn something
online training is not always stable
sometimes the agent gets a high reward(or running reward)
then its performance would decline rapidly.
so I choose some policy during the training to test the agent's performance
though training on a modified environment,
I still use the original one to test the policy,
thus illustrate the result the learning