-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestions for environments based on other POMDP types #4
Comments
The gridverse environments already exists as an external library, and I would prefer not to import it as they bring in additional dependencies. We already have a form of memory cards as our concentration environment. I guess this would leave carflag and heavenhell environments. Would you be willing to implement these? I can help guide you |
Sure, I will try my best to implement them if you could give some pointers on how to get started. |
Sure! Basically, just implement each environment as a subclass of of Maybe we can start with carflag? Here is the description from the DTQN paper:
So right off the bat, we know the observation and action space, as well as the reward function. You can just create the environments in The first few lines might look like from popgym.core.env import POPGymEnv
import gymnasium as gym
class CarFlag(POPGymEnv):
def __init__(self):
self.observation_space = gym.spaces.Box(shape=(3,), ...)
self.state_space = gym.spaces.Box(shape=(3,), ...) # Underlying Markov state
self.action_space = ...
def step(...):
self.car_position = self.car_position + self.velocity
...
def reward(..): # Not necessary, but helper function
...
def reset(..):
self.goal_position = ...
self.car_position = ...
self.oracle_position = ...
self.velocity = ...
...
def get_state(..):
# Return the position of the car, oracle, and goal
... |
Thank you so much for the detailed response. I'll fork the repo and send in a PR with the changes. |
I have added the code in #34. Could you please review it and give your feedback for improvement? |
Currently, all our environments could be classified as overcomplete POMDPs, where the number of unique latent states is greater than the number of unique observations. We are looking for environment suggestions based on other types of POMDPs, such as undercomplete POMDPs, weakly revealing POMDPs, latent MDPs, or$\gamma$ -observable POMDPs.
If you have any environment suggestions, please post them here!
The text was updated successfully, but these errors were encountered: