CityLearn is an OpenAI Gym environment for the easy implementation of reinforcement learning agents in a multi-agent demand response setting to reshape the aggregated curve of electrical demand by controlling the storage of energy by diverse types of buildings. Its main objective is to facilitate and standardize the evaluation of RL agents such that it enables easy comparison of different algorithms.

CityLearn allows to control the storage of domestic hot water (DHW), and chilled water. CityLearn also includes energy models of air-to-water heat pumps, electric heaters, and the pre-computed energy loads of the buildings, which include space cooling, dehumidification, appliances, DHW, and solar generation.


Periods of high demand for electricity raise electricity prices and the overall cost of power distribution networks. Flattening, smoothing, and reducing the curve of electrical demand helps reduce operational and capital costs of electricity generation, transmission, and distribution. Demand response is the coordination of electricity consuming agents (i.e. buildings) in order to reshape the overall curve of electrical demand.

Reinforcement learning (RL) has gained popularity in the research community as a model-free and adaptive controller for the built-environment. RL has the potential to become an inexpensive plug-and-play controller that can be easily implemented in any building regardless of its model (unlike MPC), and coordinate multiple buildings for demand response and load shaping. Despite its potential, there are still open questions regarding its plug-and-play capabilities, performance, safety of operation, and learning speed. Yet, a lack of standardization on previous research has made it difficult to compare different RL algorithms with each other, as different publications aimed at solving different problems. It is also unclear how much effort was required to tune each RL agent for each specific problem, or how well an RL agent would perform in a different building or under different weather conditions.

In an attempt to tackle these problems, we have organized this challenge using CityLearn, an OpenAI Gym Environment for the implementation of RL agents for demand response at the urban level. The environment allows the implementation of single-agent (as a centralized agent) and multi-agent decentralized RL controllers.

Reinforcement Learning Building Portfolio Co-ordination Challenge Diagram

Objective of the Challenge

The objective of the challenge is to explore the potential of reinforcement learning as a control approach for building energy coordination and demand response. In particular, participants will design, tune, and pre-train one central, or multiple decentralized, RL agents that minimize a multi-objective cost function of 5 equally weighted metrics in an entire district of buildings:

  • Peak demand (for the entire simulated period)
  • Average daily peak demand (daily peak demand of the district averaged over a year)
  • Ramping
  • 1 - Load factor (which will tend to 0 as the load factor approaches 1)
  • Net electricity consumption

This multi-objective cost function is normalized by a baseline cost obtained from the performance of a rule-based-controller (RBC) tuned by hand. Therefore, RL_cost < 1 means that the RL agent performs better than a simple RBC.

To analyze the plug-and-play and adaptive potential of RL, the controllers will be evaluated on a different dataset than the one that will be shared for the design, tuning, and pre-training of the controllers.

Team Members

Each team can consist of maximum three members. The sign up link is provided at the top left of this web-page.

Submission Deadlines

Please see the timeline below for the detailed timeline of the three stages of the challenge.

Rules and Instructions of the Challenge

Participants are provided with a design data set comprised of four sets of data from nine buildings each. Each set will have been simulated in one of four anonymized climate zones in the US. The dataset contains year-long hourly information about the cooling and DHW demand of the building, electricity consumed by appliances, solar power generation, as well as weather data and other variables. The design dataset will be available in the CityLearn Github repository after January 15th, teams can sign up anytime before of after that date.

Participants will use the design dataset to design, tune, and pre-train their RL controller(s) with the objective of shaping the load in the district and minimize the multi-objective cost function of the environment. Participants can select the states and actions the RL agents will use in each building in the file buildings_state_action_space.json, and can define their own reward function by modifying the file Communication among buildings is allowed and must be coded within the file Both centralized, and distributed controllers are allowed, and agents can take decisions both sequentially or simultaneously as long as it is all coded within the file The file can call another file, to be made by the participants, which can contain the parameters of the pre-trained RL controllers. In the Github repository we provide a sample RL agent under the class RL_Agents, which has not been tuned or pre-trained but is only provided as an example.

Participants will submit their files,, buildings_state_action_space.json, and any file with the parameters of the pre-trained agents for their evaluation on an evaluation dataset, which will be comprised of different buildings in the same climate zones but different cities. Participants will receive a score and the leader board will be updated.

At the challenge stage, participants will submit their agents and reward function for the final run on the challenge dataset, which is different than the design and the evaluation datasets.

In the evaluation and challenge stages we will paste the files submitted (,, buildings_state_action_space.json, and file with pre-trained policies, weights, or other parameters) to the CityLearn folder, and run the file as it is. Therefore, it is important that any RL agents be coded within the class RL_Agents in the file.

How to submit?

The RL agents must be written in Python 3 and can use PyTorch or TensorFlow, as well as any other library that is already used in our GitHub repository. It must be able to run in both Windows and Linux OS, in either GPU (not necessary) or CPU (if GPU is not used or is not available). Files will be submitted by email to under the subject "Submission StageOfChallenge Team_name", where the StageOfChallenge can be "Evaluation Stage" or "Challenge Stage".

At the evaluation and challenge stages, the agents will be simulated on a single one-year episode for buildings in four different climates, and the obtained costs are averaged to provide the final cost and update the leaderboard. Therefore, participants are encouraged to submit agents that have been pre-trained enough to perform well at the exploration phase but that are still able to learn from and adapt to the new buildings and weather conditions.

Some basic information about the characteristics of the buildings is provided to the agents in the file using the CityLearn method get_building_information(). This method provides information about the type of building, climate zone, solar power capacity, total DHW, cooling, and non-shiftable energy consumption, and about the correlations of the demand profiles with the rest of the buildings. The agent(s) in the file are not allowed to read any of the files in the folder "data".

Timeline of the Competition

Stages of the Challenge

The challenge consists of three stages :

01 ) Design Stage

The participants will receive four sets of building data and models in 4 anonymized climate zones. Each set will contain data from 9 different buildings. The participants will design, tune and train RL agents at their convenience and modify the files:, buildings_state_action_space.json, and A third optional file can be created and submitted with weights and policies to be read by the file.

02 ) Evaluation Stage

The participants submit their trained agents which are run by the organizers on the evaluation set. The evaluation set consists of four sets of building data and models in 4 anonymized climate zones. Each set will contain data from 9 different buildings. The participants' agents are tested on this evaluation set and the leaderboard is updated within a week of the submitted agent.

03 ) Challenge Stage

This is the final stage of the competition where the participants submit their final agent(s). The agent is tested on the challenge set which consists of four sets of building data and models in 4 anonymized climate zones. Each set will contain data from 9 different buildings. The participants receive scores and the leaderboard is updated for the final time revealing the top scorers in the challenge.


After the leaderboard has been updated The winners of the challenge will be featured in a special issue of the Journal of Building Performance Simulation (JBPS). However, we encourage anyone who has obtained great results to submit their research to the special issue as well.



José R. Vázquez-Canteli,

PhD Candidate

The University of Texas at Austin, Department of Civil, Architectural, and Environmental Engineering. Intelligent Environments Laboratory (IEL).

Sourav Dey

PhD Student

University of Colorado Boulder, Department of Civil, Environmental and Architectural Engineering

Dr. Zoltan Nagy,

Assistant Professor

The University of Texas at Austin, Department of Civil, Architectural, and Environmental Engineering. Intelligent Environments Laboratory (IEL).

Dr. Gregor Henze,


University of Colorado Boulder, Department of Civil, Environmental and Architectural Engineering


The MIT License (MIT) Copyright (c) 2019, José Ramón Vázquez-Canteli Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.