The RecoGym Challenge

Design a Recommendation Agent that can collect the largest reward in the RecoGym environment!

The prize: 3000 euros for the winning teams!

The starting date: Oct 1st / The deadline: Nov 30th

Within the challenge, there are two tasks:

The easy task, aka RecoGym Challenge 100: Learning to recommend in small action spaces. In this task, the competitors are asked to design recommendation agents over a relatively small set of actions (e.g. 100) - Status: READY TO GO!

The hard task, aka RecoGym Challenge 10.000: Learning to recommend in large action spaces. In this task, the space of possible actions is much larger (e.g. 10k) - Status: IN PREPARATION

Challenge Rules:

#1. There will be a winner for each of the two tasks. The prize money is 2000 euros for winning the hard challenge that involves 10 000 actions/recommendable items, and 1000 euros for the winning the easier challenge involving 100 actions. Of course, if the same algorithm wins both of the tasks, all of the prize money will go to the respective team.

#2. We will evaluate the agents by their resulting Click-Through Rate (CTR) over a range of RecoGym configurations that are unknown to the participants. What you should know is that we are interested in generalisation from small samples, so we will be testing in regimes with relatively small numbers of users (less than 1000). We expect the winning entry to need to make sophisticated choices with respect to a) creating a representation of the user context, b) combining organic and bandit signals and c) handling the bandit signal.

#3. Any attempt to upload malicious code will result in disqualification.

#4. Criteo employees and their families can participate but will be not eligible for winning the prize.

#5. We will make our best efforts to run all code, but cannot guarantee code that was tested on an environment different to the one specified on the github page. Feedback about code that did not run and that was submitted sufficiently early will be provided if possible.

#6. Maximum run time for training on 1000 users must be less than 5 hours on a AWS t2.2xlarge machine

#7. A leaderboard of early entries will be maintained with periodic updates.

#8. If the judges deem it necessary, the winner will be the based on the average performance over several A/B tests. On this basis, it is not necessarily the case that the leader on the leaderboard will be declared the winner.

#9. Solutions that attempt to simply reverse-engineer the reward generating process are strongly discouraged.

READY TO PARTICIPATE?

1. HERE IS HOW TO GET STARTED

2. THEN READ THE QUICK PRIMER ON RECOGYM

3. AND FOR ADVICE ON SOLUTIONS, CHECK OUR TIPS&TRICKS PAGE!

4. ONCE YOU ARE READY, SUBMIT YOUR SOLUTION TO THE CODALAB COMPETITION SITE and don't forget to follow us on Twitter @RecoGym to get the latest updates!

References:

RecoGym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online Advertising: https://arxiv.org/abs/1808.00720
On the Value of Bandit Feedback for Offline Recommender System Evaluation: https://arxiv.org/abs/1907.12384
Learning from Bandit Feedback: An Overview of the State-of-the-art: https://arxiv.org/abs/1909.08471

For more RecoGym-related materials, check our Notebooks located in the reco-gym project:

https://github.com/criteo-research/reco-gym

Want to stay updated? Follow us @recogym or subscribe below!