About RecoGym

A quick intro to RecoGym

RecoGym is an Open AI environment that simulates a recommender system in a reinforcement learning test framework. The repository can be found here, including multiple notebooks to demonstrate various aspects, a description and motivation can be found here. Doing well in the challenge will require mastering several aspects of recommender systems.


RecoGym is an environment that simulates user behaviour. It has the following calls:

env.reset() - this tells the simulator that you want to move to the next user.


Before you can act you need to learn something about the user. In order to do that you make a first call to the step() function with action=None.

observation, reward, done, info = env.step(action)


This returns the products (as item ids) that a user viewed in the observation variable. It also returns done that indicates if the session is over (and a call to reset() is needed).

At this time you can make your first recommendation to the user. This can be done by calling step() once again where action is set to the item id you would like to recommend. Step() then returns if this recommendation was successful in the reward - please note click-through rates are low and RecoGym respects this. Expect to get only a few percent for click-through rates, even for very good agents. The observation variable is populated by more (organic) product views if there are any and done indicates that the session is over.


A RecoGym agent is a class that inherits Agent and implements an act() method that takes observation, reward, done and produces a recommended action (and a propensity score - to be discussed later).

RecoGym agents also have a train() method (here it differs from Open AI Gym and standard reinforcement learning). The reason for the presence of train() is that agents will often need to learn not from their own actions but rather from the offline logs of another agent (e.g. the production recommendation policy. While this differs from standard reinforcement learning it very much respects recommender systems practice.

Reinforcement learning tightly couples acting and learning into a single step. In contrast most current RecoGym agents start by learning on offline logs and then acting (usually without any further learning).


To know more, visit our GitHub page and check-out some of our Notebook tutorials!