Suggestions for environments based on other POMDP types #4

smorad · 2022-11-08T10:44:28Z

Currently, all our environments could be classified as overcomplete POMDPs, where the number of unique latent states is greater than the number of unique observations. We are looking for environment suggestions based on other types of POMDPs, such as undercomplete POMDPs, weakly revealing POMDPs, latent MDPs, or $\gamma$-observable POMDPs.

If you have any environment suggestions, please post them here!

ashok-arora · 2024-06-27T06:28:47Z

@smorad Would it make sense to add the benchmarks from the DTQN paper?

smorad · 2024-06-27T15:20:14Z

The gridverse environments already exists as an external library, and I would prefer not to import it as they bring in additional dependencies. We already have a form of memory cards as our concentration environment. I guess this would leave carflag and heavenhell environments. Would you be willing to implement these? I can help guide you

ashok-arora · 2024-06-28T10:18:12Z

Sure, I will try my best to implement them if you could give some pointers on how to get started.

smorad · 2024-06-28T12:33:09Z

Sure! Basically, just implement each environment as a subclass of of POPGymEnv. This is just a gymnasium environment with an additional get_state method. This method should just return the underlying Markov state (e.g., position of agent and position of goal). Like any gymnasium environment, you'd need to implement the reset and step method, as well as define the observation_space and action_space.

Maybe we can start with carflag? Here is the description from the DTQN paper:

Car flag tasks a car with driving across on a 1D line to the correct flag. The car must first drive to the oracle flag and then to the correct endpoint. The agent observation is a vector of 3 floats, including its position on the line, its velocity at each timestep, and, when it is at the oracle flag, it is also informed of the goal flag’s location. The agent’s action alters its velocity; it may accelerate left, perform a no-op (i.e. maintain current velocity), or accelerate right. The agent receives a reward of 1 for reaching the goal flag, a reward of -1 for reaching the incorrect falg, and 0 otherwise.

So right off the bat, we know the observation and action space, as well as the reward function. You can just create the environments in popgym/popgym/envs/carflag.py. In the envs directory, there are a ton of other environments that you can look at for inspiration. For example, here is MineSweeper.

The first few lines might look like

from popgym.core.env import POPGymEnv
import gymnasium as gym

class CarFlag(POPGymEnv):
  def __init__(self):
    self.observation_space = gym.spaces.Box(shape=(3,), ...)
    self.state_space = gym.spaces.Box(shape=(3,), ...) # Underlying Markov state
    self.action_space = ...
  
  def step(...):
    self.car_position = self.car_position + self.velocity
    ...
  
  def reward(..): # Not necessary, but helper function
    ...

  def reset(..):
    self.goal_position = ...
    self.car_position = ...
    self.oracle_position = ...
    self.velocity = ...
    ...

  def get_state(..):
    # Return the position of the car, oracle, and goal
...

ashok-arora · 2024-06-29T10:12:34Z

Thank you so much for the detailed response. I'll fork the repo and send in a PR with the changes.

ashok-arora · 2024-06-29T10:40:46Z

I have added the code in #34. Could you please review it and give your feedback for improvement?
I have tried to keep the style of code similar to the minesweeper.py file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestions for environments based on other POMDP types #4

Suggestions for environments based on other POMDP types #4

smorad commented Nov 8, 2022 •

edited

Loading

ashok-arora commented Jun 27, 2024

smorad commented Jun 27, 2024 •

edited

Loading

ashok-arora commented Jun 28, 2024

smorad commented Jun 28, 2024 •

edited

Loading

ashok-arora commented Jun 29, 2024

ashok-arora commented Jun 29, 2024

Suggestions for environments based on other POMDP types #4

Suggestions for environments based on other POMDP types #4

Comments

smorad commented Nov 8, 2022 • edited Loading

ashok-arora commented Jun 27, 2024

smorad commented Jun 27, 2024 • edited Loading

ashok-arora commented Jun 28, 2024

smorad commented Jun 28, 2024 • edited Loading

ashok-arora commented Jun 29, 2024

ashok-arora commented Jun 29, 2024

smorad commented Nov 8, 2022 •

edited

Loading

smorad commented Jun 27, 2024 •

edited

Loading

smorad commented Jun 28, 2024 •

edited

Loading