Carlos Florensa

PhD student in Computer Science at UC Berkeley


I am a PhD candidate in the Robotics Learning Lab at UC Berkeley, under the supervision of Prof. Pieter Abbeel. My main interest is solving robotics tasks in variable environments, with minimum supervision. This yields challenging sparse reward problems for Reinforcement Learning. I believe policy hierarchy, few shot learning, and automatic curriculum generation are key to solve these tasks, scale up to real world scenarios, and empower robotic systems.

I obtained in 2015 a doubled degree in Mathematics and Industrial Engineering with Honors, at the Center for High Interdisciplinary Training, (CFIS) in the Polytechnic University of Catalonia. During my undergrad I performed research at several international institutions like the Argonne National Lab (2013. Illinois, USA) under the supervision of Prof. Victor Zavala, the Ecole Polytechnique Federale de Lausanne (2014. Switzerland) under the supervision of Prof. Rachid Cherkaoui and Carnegie Mellon University (2015. Pittsburgh, USA) under the supervision of Prof. Ignacio Grossmann. I also worked in several positions at the Institute of Photonic Sciences (ICFO, Barcelona, Spain).

During my PhD, I have been under the LaCaixa Fellowship, then under the Berkeley Deep Drive fellowship. The summer of 2018 I was interning at Deep Mind in London.

Contact me:


Self-supervised Learning of Image Embedding for Continuous Control

Operating directly from raw high dimensional sensory inputs like images is still a challenge for robotic control. Recently, Reinforcement Learning methods have been proposed to solve specific tasks end-to-end, from pixels to torques. However, these approaches still require the access to a specified reward which may require specialized instrumentation of the environment. Furthermore, the obtained policy and representations tend to be task specific and may not transfer well. In this work we investigate completely unsupervised learning of a general image embedding and control primitives, based on finding the shortest time to reach any state. We also introduce a new structure for the state-action value function that builds a connection between model-free and model-based methods, and improves the performance of the learning algorithm. We experimentally demonstrate these findings in three simulated robotic tasks.

Cite as: Carlos Florensa, Jonas Degrave, Nicolas Heess, Jost Tobias Springenberg, Martin Riedmiller. Self-supervised Learning of Image Embedding for Continuous Control. Contributed Talk in I2C workshop at Advances in Neural Information Processing Systems (NIPS) 2018.

Automatic Goal Generation for Reinforcement Learning Agents

Reinforcement learning (RL) is a powerful technique to train an agent to perform a task; however, an agent that is trained using RL is only capable of achieving the single task that is specified via its reward function. Such an approach does not scale well to settings in which an agent needs to perform a diverse set of tasks, such as navigating to varying positions in a room or moving objects to varying locations. Instead, we propose a method that allows an agent to automatically discover the range of tasks that it is capable of performing in its environment. We use a generator network to propose tasks for the agent to try to accomplish, each task being specified as reaching a certain parametrized subset of the state-space. The generator network is optimized using adversarial training to produce tasks that are always at the appropriate level of difficulty for the agent, thus automatically producing a curriculum. We show that, by using this framework, an agent can efficiently and automatically learn to perform a wide set of tasks without requiring any prior knowledge of its environment, even when only sparse rewards are available.

Cite as: Carlos Florensa*, David Held*, Xinyang Geng*, Pieter Abbeel. Automatic Goal Generation for Reinforcement Learning Agents. In International Conference in Machine Learning (ICML) 2018.

Supplementary material and videos available.

Reverse Curriculum Generation for Reinforcement Learning

Many relevant tasks require an agent to reach a certain state, or to manipulate objects into a desired configuration. For example, we might want a robot to align and assemble a gear onto an axle or insert and turn a key in a lock. These tasks present considerable difficulties for reinforcement learning approaches, since the natural reward function for such goal-oriented tasks is sparse and prohibitive amounts of exploration are required to reach the goal and receive a learning signal. Past approaches tackle these problems by manually designing a task-specific reward shaping function to help guide the learning. Instead, we propose a method to learn these tasks without requiring any prior task knowledge other than obtaining a single state in which the task is achieved. The robot is trained in "reverse", gradually learning to reach the goal from a set of starting positions increasingly far from the goal. Our method automatically generates a curriculum of starting positions that adapts to the agent's performance, leading to efficient training on such tasks. We demonstrate our approach on difficult simulated fine-grained manipulation problems, not solvable by state-of-the-art reinforcement learning methods.

Cite as: Carlos Florensa, David Held, Markus Wulfmeier, Michael Zhang, Pieter Abbeel. Reverse Curriculum Generation for Reinforcement Learning. In Conference on Robot Learning (CoRL) 2017.

Supplementary material and videos available.

Stochastic Neural Networks for Hierarchical Reinforcement Learning

Deep reinforcement learning has achieved many impressive results in recent years. However, tasks with sparse rewards or long horizons continue to pose significant challenges. To tackle these important problems, we propose a general framework that first learns useful skills in a pre-training environment, and then leverages the acquired skills for learning faster in downstream tasks. Our approach brings together some of the strengths of intrinsic motivation and hierarchical methods: the learning of useful skill is guided by a single proxy reward, the design of which requires very minimal domain knowledge about the downstream tasks. Then a high-level policy is trained on top of these skills, providing a significant improvement of the exploration and allowing to tackle sparse rewards in the downstream tasks. To efficiently pre-train a large span of skills, we use Stochastic Neural Networks combined with an information-theoretic regularizer. Our experiments show that this combination is effective in learning a wide span of interpretable skills in a sample-efficient way, and can significantly boost the learning performance uniformly across a wide range of downstream tasks.

Cite as: Florensa, Carlos; Duan, Yan; Abbeel, Pieter. Stochastic Neural Networks for Hierarchical Reinforcement Learning. In International Conference in Learning Representations (ICLR) 2017.

Code and videos available.

Capacity planning with competitive decision-makers: Trilevel MILP formulation, degeneracy, and solution approaches

Capacity planning addresses the decision problem of an industrial producer investing on infrastructure to satisfy future demand with the highest profit. Traditional models neglect the rational behavior of some external decision-makers by assuming either static competition or captive markets. We propose a mathematical programing formulation with three levels of decision-makers to capture the dynamics of duopolistic markets. The trilevel model is transformed into a bilevel optimization problem with mixed-integer variables in both levels by replacing the third-level linear program with its optimality conditions. We introduce new definitions required for the analysis of degeneracy in multilevel models, and develop two novel algorithms to solve these challenging problems. Each algorithm is shown to converge to a different type of degenerate solution. The computational experiments for capacity expansion in industrial gas markets show that no algorithm is strictly superior in terms of performance.

Cite as: Carlos Florensa, Pablo Garcia-Herreros, Pratik Misra, Erdem Arslan, Sanjay Mehta, Ignacio E. Grossmann. Capacity planning with competitive decision-makers: Trilevel MILP formulation, degeneracy, and solution approaches. European Journal of Operations Research 2017.

“The magic of light!” - An entertaining optics and photonics awareness program

Illusionism provides a surprising and unforgettable way of explaining photonics to a wide audience. Imagine grabbing with your own hand an egg-sized photon with the same incredible properties as in a quantum computer! And what about touching the light beam which detects and removes diseased cells like in cutting edge medical prototypes? The art of magic allows promoting photonics, exploring advanced subjects in an understandable and palpable fashion that strongly inspires all ages.

Cite as: Carlos Florensa, Miriam Martí, S. Chaitanya Kumar, Silvia Carrasco. “The magic of light!” - An entertaining optics and photonics awareness program. Education and Training in Optics and Photonics, 2013.

Curriculum Vitae