Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestions for environments based on other POMDP types #4

Open
smorad opened this issue Nov 8, 2022 · 6 comments
Open

Suggestions for environments based on other POMDP types #4

smorad opened this issue Nov 8, 2022 · 6 comments

Comments

@smorad
Copy link
Collaborator

smorad commented Nov 8, 2022

Currently, all our environments could be classified as overcomplete POMDPs, where the number of unique latent states is greater than the number of unique observations. We are looking for environment suggestions based on other types of POMDPs, such as undercomplete POMDPs, weakly revealing POMDPs, latent MDPs, or $\gamma$-observable POMDPs.

If you have any environment suggestions, please post them here!

@ashok-arora
Copy link

@smorad Would it make sense to add the benchmarks from the DTQN paper?

@smorad
Copy link
Collaborator Author

smorad commented Jun 27, 2024

The gridverse environments already exists as an external library, and I would prefer not to import it as they bring in additional dependencies. We already have a form of memory cards as our concentration environment. I guess this would leave carflag and heavenhell environments. Would you be willing to implement these? I can help guide you

@ashok-arora
Copy link

Sure, I will try my best to implement them if you could give some pointers on how to get started.

@smorad
Copy link
Collaborator Author

smorad commented Jun 28, 2024

Sure! Basically, just implement each environment as a subclass of of POPGymEnv. This is just a gymnasium environment with an additional get_state method. This method should just return the underlying Markov state (e.g., position of agent and position of goal). Like any gymnasium environment, you'd need to implement the reset and step method, as well as define the observation_space and action_space.

Maybe we can start with carflag? Here is the description from the DTQN paper:

Car flag tasks a car with driving across on a 1D line to the correct flag. The car must first drive to the oracle flag and then to the correct endpoint. The agent observation is a vector of 3 floats, including its position on the line, its velocity at each timestep, and, when it is at the oracle flag, it is also informed of the goal flag’s location. The agent’s action alters its velocity; it may accelerate left, perform a no-op (i.e. maintain current velocity), or accelerate right. The agent receives a reward of 1 for reaching the goal flag, a reward of -1 for reaching the incorrect falg, and 0 otherwise.

So right off the bat, we know the observation and action space, as well as the reward function. You can just create the environments in popgym/popgym/envs/carflag.py. In the envs directory, there are a ton of other environments that you can look at for inspiration. For example, here is MineSweeper.

The first few lines might look like

from popgym.core.env import POPGymEnv
import gymnasium as gym

class CarFlag(POPGymEnv):
  def __init__(self):
    self.observation_space = gym.spaces.Box(shape=(3,), ...)
    self.state_space = gym.spaces.Box(shape=(3,), ...) # Underlying Markov state
    self.action_space = ...
  
  def step(...):
    self.car_position = self.car_position + self.velocity
    ...
  
  def reward(..): # Not necessary, but helper function
    ...

  def reset(..):
    self.goal_position = ...
    self.car_position = ...
    self.oracle_position = ...
    self.velocity = ...
    ...

  def get_state(..):
    # Return the position of the car, oracle, and goal
...

@ashok-arora
Copy link

Thank you so much for the detailed response. I'll fork the repo and send in a PR with the changes.

@ashok-arora
Copy link

I have added the code in #34. Could you please review it and give your feedback for improvement?
I have tried to keep the style of code similar to the minesweeper.py file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants