This is part 3 of a blog series on deep reinforcement learning. See “Part 1: Demystifying Deep Reinforcement Learning” for an introduction to the topic and “Part 2: Deep Reinforcement Learning with Neon” for the original implementation in Simple-DQN.
In this blog post we will extend a Simple-DQN to work with OpenAI Gym, a new toolkit for developing and comparing reinforcement learning algorithms. Read more about the release on their blog. We will cover how to train and test an agent with the new environment using Neon.
Update: Code has been updated and is now at https://github.com/tambetm/simple_dqn.
Figure 1. Agent Environment Loop
OpenAI Gym provides a simple interface for interacting with the environment. Given an observation of previous state and reward, an agent chooses an action to perform on the environment to provide the next state and reward.
observation, reward, done, info = environment.step(action)
In our case, the environment is an Atari game, the observation is a game screen, and the reward is the score obtained from that action. Since OpenAI Gym uses a different interface (atari_py) to the Arcade Learning Environment (ALE), we can create a wrapper class, GymEnvironment, around the OpenAI Gym environment to work with the Simple-DQN training code. Before, Simple-DQN retrieved the screen and terminal state directly from the ALE environment after performing an action whereas the OpenAI Gym environment returns this data each time the agent acts on the environment. So we can instead store these variables as fields in our wrapper and use them as needed. Creating an environment also differs slightly in that we specify which game to use with an environment id such as “Breakout-v0” instead of loading directly from a rom file.
class GymEnvironment(Environment): def __init__(self, env_id, args): import gym self.gym = gym.make(env_id) self.obs = None self.terminal = None def numActions(self): return self.gym.action_space.n def restart(self): self.gym.reset() self.obs = None self.terminal = None def act(self, action): self.obs, reward, self.terminal, _ = self.gym.step(action) return reward def getScreen(self): assert self.obs is not None return self.obs def isTerminal(self): assert self.terminal is not None return self.terminal
To train with OpenAI Gym instead of ALE, we just specify the environment (OpenAI Gym or ALE) and the game. OpenAI Gym returns the full RGB screen (210, 160) that we then convert to grayscale and resize to (84, 84).
./train.sh Breakout-v0 –environment gym
This will train a model using the OpenAI Gym environment and save model snapshots every epoch.
To test a trained model on OpenAI Gym, we will first create a GymAgent that
- Stores the last four screen observations in memory
- Given the last four screen observations, uses the trained model to find the action with the highest q value
class GymAgent(): def __init__(self, env, net, memory, args): self.env = env self.net = net self.memory = memory self.history_length = args.history_length self.exploration_rate_test = args.exploration_rate_test def add(self, observation): self.memory[0, :-1] = self.memory[0, 1:] self.memory[0, -1] = np.array(observation) def get_action(self, t, observation): self.add(observation) if t &amp;amp;amp;lt; self.history_length or random.random() &amp;amp;amp;lt; self.exploration_rate_test: action = env.action_space.sample() else: qvalues = net.predict(memory) action = np.argmax(qvalues) return action
Then we can simply instantiate the agent with the environment and saved model and call get_action during the test loop described here to find the optimal action to play during each time step.
agent = GymAgent(env, net, memory, args) env.monitor.start(args.output_folder, force=True) num_episodes = 10 for i_episode in xrange(num_episodes): observation = env.reset() for t in xrange(10000): action = agent.get_action(t, observation) observation, reward, done, info = env.step(action) if done: break env.monitor.close()
This code for testing is all in this script which can be run with
python src/test_gym.py Breakout-v0 <output_folder> –load_weights <saved_model_pkl>
This will log the testing results and record videos to the specified output_folder which we can then upload to OpenAI Gym for evaluation. It is also recommended to upload a gist describing how to reproduce your results.
An example video of an agent playing several episodes:
Using Nervana Cloud
To train a model on Nervana Cloud, first install and configure ncloud. ncloud is a command line client to help you use and manage Nervana’s deep learning cloud.
Assuming the necessary dependencies are installed, we can run training with:
ncloud train src/main.py –args ”Breakout-v0 –environment gym” –custom_code_url https://github.com/tambetm/simple_dqn
and testing with:
ncloud train src/test_gym.py –args ”Breakout-v0 –load_weights <saved_model_pkl>” ——custom_code_url https://github.com/tambetm/simple_dqn
To find out more about Nervana Cloud, visit Nervana’s Products page.
OpenAI Gym provides a nice toolkit for training and testing reinforcement learning algorithms. Extending Simple-DQN to work with OpenAI Gym was relatively straightforward to implement and hopefully others can easily extend this work to develop better learning algorithms.