import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
# In Google Colab, uncomment this:
# !wget https://bit.ly/2FMJP5K -O setup.py && bash setup.py
# This code creates a virtual display to draw game images on.
# If you are running locally, just ignore it
import os
if type(os.environ.get("DISPLAY")) is not str or len(os.environ.get("DISPLAY")) == 0:
!bash ../xvfb start
%env DISPLAY = : 1
We're gonna spend several next weeks learning algorithms that solve decision processes. We are then in need of some interesting decision problems to test our algorithms.
That's where OpenAI gym comes into play. It's a python library that wraps many classical decision problems including robot control, videogames and board games.
So here's how it works:
import gym
env = gym.make("MountainCar-v0")
env.reset()
plt.imshow(env.render('rgb_array'))
print("Observation space:", env.observation_space)
print("Action space:", env.action_space)
Note: if you're running this on your local machine, you'll see a window pop up with the image above. Don't close it, just alt-tab away.
The three main methods of an environment are
obs0 = env.reset()
print("initial observation code:", obs0)
# Note: in MountainCar, observation is just two numbers: car position and velocity
print("taking action 2 (right)")
new_obs, reward, is_done, _ = env.step(2)
print("new observation code:", new_obs)
print("reward:", reward)
print("is game over?:", is_done)
# Note: as you can see, the car has moved to the right slightly (around 0.0005)
Below is the code that drives the car to the right.
However, it doesn't reach the flag at the far right due to gravity.
Your task is to fix it. Find a strategy that reaches the flag.
You're not required to build any sophisticated algorithms for now, feel free to hard-code :)
Hint: your action at each step should depend either on t
or on s
.
from IPython import display
# create env manually to set time limit. Please don't change this.
TIME_LIMIT = 250
env = gym.wrappers.TimeLimit(
gym.envs.classic_control.MountainCarEnv(),
max_episode_steps=TIME_LIMIT + 1,
)
s = env.reset()
actions = {'left': 0, 'stop': 1, 'right': 2}
plt.figure(figsize=(4, 3))
display.clear_output(wait=True)
for t in range(TIME_LIMIT):
plt.gca().clear()
# change the line below to reach the flag
s, r, done, _ = env.step(actions['right'])
# draw game image on display
plt.imshow(env.render('rgb_array'))
display.clear_output(wait=True)
display.display(plt.gcf())
if done:
print("Well done!")
break
else:
print("Time limit exceeded. Try again.")
display.clear_output(wait=True)
assert s[0] > 0.47
print("You solved it!")