`rlai` Examples

This repository provides examples of using the rlai package for reinforcement learning.

Installation

Clone the repository:
```
git clone [email protected]:MatthewGerber/rlai-dependency-example.git
```
Note the structure of the example project, particularly pyproject.toml (which brings in the rlai package).

Install to a new Python environment:

cd rlai-dependency-example
poetry env use 3.11
poetry install

Sticky Gridworld

This example modifies the probabilistic structure of a standard gridworld environment, specifically Example 4.1 in the Sutton and Barto (2018) text. The modification causes one of the goal states to be surrounded by states that are "sticky", in that they can be traversed in the usual way but incur a 10x larger negative reward for doing so. This can be interpreted as taking 10x longer to traverse than the other states (hence the nickname "sticky").

Run the example code:

python src/example/sticky_gridworld.py

The generated policy should be as follows.

State 0:
	Pr(u):  0.25
	Pr(d):  0.25
	Pr(l):  0.25
	Pr(r):  0.25
State 1:
	Pr(l):  1.0
State 2:
	Pr(l):  1.0
State 3:
	Pr(l):  1.0
State 4:
	Pr(u):  1.0
State 5:
	Pr(l):  1.0
State 6:
	Pr(l):  1.0
State 7:
	Pr(u):  1.0
State 8:
	Pr(u):  1.0
State 9:
	Pr(u):  1.0
State 10:
	Pr(l):  1.0
State 11:
	Pr(d):  1.0
State 12:
	Pr(u):  1.0
State 13:
	Pr(l):  1.0
State 14:
	Pr(r):  1.0
State 15:
	Pr(u):  0.25
	Pr(d):  0.25
	Pr(l):  0.25
	Pr(r):  0.25

In the above, note that states 0 (upper-left corner) and 15 (lower-right corner) are goal states in the 4x4 gridworld. The optimal action in state 14 is to move right, incur the -10 reward, and reach the goal state 15. By contrast, the optimal action in state 13 is to move left toward state 0 in the upper-left corner, ignoring the goal state 15 just two steps to the right owing to the -10 reward that would be received after entering state 14.

OpenAI Gym Cartpole with Stochastic Gradient Descent

This example recreates the case study described here.

Run the example code:

python src/examples/openai_gym/cartpole.py

Training plots will be displayed, and a video will be rendered every 100 episodes.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.run		.run
src/examples		src/examples
.gitignore		.gitignore
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`rlai` Examples

Installation

Sticky Gridworld

OpenAI Gym Cartpole with Stochastic Gradient Descent

About

Releases

Packages

Languages

MatthewGerber/rlai-dependency-example

Folders and files

Latest commit

History

Repository files navigation

rlai Examples

Installation

Sticky Gridworld

OpenAI Gym Cartpole with Stochastic Gradient Descent

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`rlai` Examples

Packages