Agent is trained using Q-learning algorithm with DQN network. It is the basic DQN without any priority experience replay, Double DQN.
The deep Q-network is a value based method which tries to learn the optimal action value for given state.
The agent follows epsilon-greedy policy where epsilon decays as the training progresses.

Four layer MLP

self.fc1=nn.Linear(state_size,160) -> nn.Linear(160,80) -> nn.Linear(80,80) -> nn.Linear(80,action_size)

Applying these techniques to improve

Double DQN - Prevents overestimation
Prioritized Experience Replay - More important transitions should be sampled with higher probability.
Dueling DQN - Results show that this architecture leads to better policy evaluation in the presence of many similar-valued actions

Provide feedback

Saved searches