This is part 3 of a blog series on deep reinforcement learning. See “Part 1: Demystifying Deep Reinforcement Learning” for an introduction to the topic and “Part 2: Deep Reinforcement Learning with Neon” for the original implementation in Simple-DQN.

In this blog post we will extend a Simple-DQN to work with OpenAI Gym, a new toolkit for developing and comparing reinforcement learning algorithms. Read more about the release on their blog. We will cover how to train and test an agent with the new environment using Neon.

Update: Code has been updated and is now at https://github.com/tambetm/simple_dqn.

GymEnvironment

Figure 1. Agent Environment Loop

OpenAI Gym provides a simple interface for interacting with the environment. Given an observation of previous state and reward, an agent chooses an action to perform on the environment to provide the next state and reward.

observation, reward, done, info = environment.step(action)


In our case, the environment is an Atari game, the observation is a game screen, and the reward is the score obtained from that action. Since OpenAI Gym uses a different interface (atari_py) to the Arcade Learning Environment (ALE), we can create a wrapper class, GymEnvironment, around the OpenAI Gym environment to work with the Simple-DQN training code. Before, Simple-DQN retrieved the screen and terminal state directly from the ALE environment after performing an action whereas the OpenAI Gym environment returns this data each time the agent acts on the environment. So we can instead store these variables as fields in our wrapper and use them as needed. Creating an environment also differs slightly in that we specify which game to use with an environment id such as “Breakout-v0” instead of loading directly from a rom file.

class GymEnvironment(Environment):
   def __init__(self, env_id, args):
      import gym
      self.gym = gym.make(env_id)
      self.obs = None
      self.terminal = None

   def numActions(self):
      return self.gym.action_space.n

   def restart(self):
      self.gym.reset()
      self.obs = None
      self.terminal = None

   def act(self, action):
      self.obs, reward, self.terminal, _ = self.gym.step(action)
      return reward

   def getScreen(self):
      assert self.obs is not None
      return self.obs

   def isTerminal(self):
      assert self.terminal is not None
      return self.terminal

 

Training

To train with OpenAI Gym instead of ALE, we just specify the environment (OpenAI Gym or ALE) and the game. OpenAI Gym returns the full RGB screen (210, 160) that we then convert to grayscale and resize to (84, 84).

./train.sh Breakout-v0 –environment gym

This will train a model using the OpenAI Gym environment and save model snapshots every epoch.

Testing

To test a trained model on OpenAI Gym, we will first create a GymAgent that

  • Stores the last four screen observations in memory
  • Given the last four screen observations, uses the trained model to find the action with the highest q value
class GymAgent():
    def __init__(self, env, net, memory, args):
        self.env = env
        self.net = net
        self.memory = memory
        self.history_length = args.history_length
        self.exploration_rate_test = args.exploration_rate_test

    def add(self, observation):
        self.memory[0, :-1] = self.memory[0, 1:]
        self.memory[0, -1] = np.array(observation)

    def get_action(self, t, observation):
        self.add(observation)
        if t < self.history_length or random.random() < self.exploration_rate_test:
            action = env.action_space.sample()
        else:
            qvalues = net.predict(memory)
            action = np.argmax(qvalues[0])
        return action

Then we can simply instantiate the agent with the environment and saved model and call get_action during the test loop described here to find the optimal action to play during each time step.

agent = GymAgent(env, net, memory, args)
env.monitor.start(args.output_folder, force=True)
num_episodes = 10
for i_episode in xrange(num_episodes):
   observation = env.reset()
   for t in xrange(10000):
       action = agent.get_action(t, observation)
       observation, reward, done, info = env.step(action)
       if done:
           break
env.monitor.close()

This code for testing is all in this script which can be run with

python src/test_gym.py Breakout-v0 <output_folder> –load_weights <saved_model_pkl>

This will log the testing results and record videos to the specified output_folder which we can then upload to OpenAI Gym for evaluation. It is also recommended to upload a gist describing how to reproduce your results.

Figure 2. Evaluation Results on OpenAI Gym

An example video of an agent playing several episodes:

 

Using Nervana Cloud

To train a model on Nervana Cloud, first install and configure ncloud.  ncloud is a command line client to help you use and manage Nervana’s deep learning cloud. 

Assuming the necessary dependencies are installed, we can run training with:

ncloud train src/main.py –args ”Breakout-v0 –environment gym” –custom_code_url https://github.com/tambetm/simple_dqn

and testing with:

ncloud train src/test_gym.py –args ”Breakout-v0 –load_weights <saved_model_pkl>” ——custom_code_url https://github.com/tambetm/simple_dqn

To find out more about Nervana Cloud, visit Nervana’s Products page

Conclusion

OpenAI Gym provides a nice toolkit for training and testing reinforcement learning algorithms. Extending Simple-DQN to work with OpenAI Gym was relatively straightforward to implement and hopefully others can easily extend this work to develop better learning algorithms.