Implementation Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm in keras with very simple customization. Link to the paper https://arxiv.org/pdf/1706.02275.pdf
Previous version of code is available in v0.1 branch
- Project Description
- Features
- Installation
- Usage
- Code Structure
- Possible Enhancements
- How to Contribute
- Support
- License
- This is implementation of maddpg algorithm in tensorflow keras and is easy to understand
- maddpg implementation of openai is in tensorflow v1, hence making it difficult to understand for those who is accustomed to tensorflow v2 and keras
- This is implementation is built-up on DDPG implementation on Keras Website, have a look at ddpg implementation as well
- This repository is a good starting point for those looking to customize maddpg implementation
-
This implementation has been succesfully tested for competetive environment of 2 pursuer and 1 evader problem
-
This implementation works for any (n) number of agents, which can be decided by user
-
To work with this implementation, user only needs to create a new
env.py
file, defining the environment -
Here reward curve generated by this implementation 2 pursuer-1 evader envader after training for 3000 episodes
-
Also check the small animation generated by trained model (using this implementation) for 2 pursuer-1 evader envader environment
maddpg-keras.mp4
- Impementation is very well documentated, and given easy implementation it is easy to understand
- Author of code can be contacted directly on email([email protected]) or linkedin in case of issue
- Please note that GPU implementation is currently not supported, please contact author of the code if enhancement to add GPU support for training is needed
- It takes around 20 hours to train 3 agents in 2 pursuer-1 evader environment for 3000 episodes (100 steps in each episode) on single i5-113G7 processor
For successful installation, use the given commands in terminal
git clone https://github.com/pr-shukla/maddpg-keras.git
cd maddpg-keras
pip install -r requirements.txt
- To train on the same 2 pursuer 1 evader competetive environment run the following command in root folder
python3 train.py
- You can create custom environment in
env.py
and then repeat step 1.env.py
file should have following class and method
class Environment:
def __init__(self):
pass
def initial_obs():
'''
Define initial observation state of your environment
'''
def step(self, action):
'''
Execute step and calculate new observation state
'''
def reward(state):
'''
Calculate reward given new state
'''
- You may want to change values of parameter like STD_DEV, GAMMA, TAU in
config.py
for custom environment - To quickly see the result of previous training that author did, you can run
predict.py
as (trained models are saved in saved_models folder)
python3 predict.py
- Code contains three directory:
maddpg
(contains code for maddpg implementation),env
(contains training and prediction environment code),saved_models
(contained pretrained models) train.py
: Main trianing codeconfig.py
: Define training parameters like NUM_EPISODES, NUM_STEPSpredict.py
: Code for testing trained model on prediction environment\maddpg\buffer.py
: 1. Calculates gradient and updates critic and actor models 2. Maintains buffer of experience\maddpg\model.py
: Creates neural network model for actor and critic model\maddpg\noise.py
: Creates random noise which added to predicted action for more exploration\env\env.py
: Training environment is defined here\env\env_predict.py
: Prediction/Testing environment is defined here- Please refer to algorithm while going through code
- Gradient calculation steps are extensively documented in Buffer.learn() method in
buffer.py
.
Updates as of Dec 10, 2023
- Training on GPU is not supported, contribution is welcomed to make this enhancement
- Implementation has been tested for tensorflow version 2.3 and 2.8, recent versions may not work.
- Currently time complexity of training is O(batch size), please look at implmentation in buffer.py for more details
- Implementation works for agents performing single dimensional actions only, not for multi dimensional action
- Actions are unnessarily calculated in Buffer.learn() methods in
buffer.py
. Search @bug inbuffer.py
for more details
- To make contribution simply create issue and raise pull request.
- For any support related to implementation, you can either raise an issue or could direectly shoot email at [email protected]
Licensed under MIT license.