PhD student in Computer Science at UC Berkeley
I am a PhD candidate in the Robotics Learning Lab at UC Berkeley, under the supervision of Prof. Pieter Abbeel. My main interest is solving robotics tasks in variable environments, with minimum supervision. This yields challenging sparse reward problems for Reinforcement Learning. I believe policy hierarchy, few shot learning, and automatic curriculum generation are key to solve these tasks, scale up to real world scenarios, and empower robotic systems.
I obtained in 2015 a doubled degree in Mathematics and Industrial Engineering with Honors, at the Center for High Interdisciplinary Training, (CFIS) in the Polytechnic University of Catalonia. During my undergrad I performed research at several international institutions like the Argonne National Lab (2013. Illinois, USA) under the supervision of Prof. Victor Zavala, the Ecole Polytechnique Federale de Lausanne (2014. Switzerland) under the supervision of Prof. Rachid Cherkaoui and Carnegie Mellon University (2015. Pittsburgh, USA) under the supervision of Prof. Ignacio Grossmann. I also worked in several positions at the Institute of Photonic Sciences (ICFO, Barcelona, Spain).
During my PhD, I have been under the LaCaixa Fellowship, then under the Berkeley Deep Drive fellowship. The summer of 2018 I was interning at Deep Mind in London, and the summer of 2019 at NVIDIA's Seattle Robotic Lab. I am now interning at Covariant!
Contact me: email@example.com
GUAPO: Guided Uncertainty-Aware Policy Optimization
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state. On the other hand, learning-based approaches, such as Reinforcement Learning (RL), can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle. In this work we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline, while requiring minimal interaction with the environment. This is achieved by leveraging uncertainty estimates to divide the space in regions where the given model-based policy is reliable, and regions where it may have flaws or not be well defined. In these uncertain regions, we show that a local RL policy can be learned directly from raw sensory inputs. We test our algorithm, Guided Uncertainty-Aware Policy Optimization (GUAPO), on a real-world robot performing tight-fitting peg insertion.
Cite as: Michelle A. Lee*, Carlos Florensa*, Jonathan Tremblay, Nathan Ratliff, Animesh Garg, Fabio Ramos, and Dieter Fox. Guided Uncertainty-Aware Policy Optimization: Combining Learning and Model-Based Strategies for Sample-Efficient Policy Learning. International Conference on Robotics and Automation, ICRA 2020.
Best paper award talk at the Robot Learning workshop, NeurIPS 2019.
Goal-conditioned Imitation Learning
Designing rewards for Reinforcement Learning (RL) is challenging because it needs to convey the desired task, be efficient to optimize, and be easy to compute. The latter is particularly problematic when applying RL to robotics, where detecting whether the desired configuration is reached might require considerable supervision and instrumentation. Furthermore, we are often interested in being able to reach a wide range of configurations, hence setting up a different reward every time might be unpractical. Methods like Hindsight Experience Replay (HER) have recently shown promise to learn policies able to reach many goals, without the need of a reward. Unfortunately, without tricks like resetting to points along the trajectory, HER might take a very long time to discover how to reach certain areas of the state-space. In this work we investigate different approaches to incorporate demonstrations to drastically speed up the convergence to a policy able to reach any goal, also surpassing the performance of an agent trained with other Imitation Learning algorithms. Furthermore, our method can be used when only trajectories without expert actions are available, which can leverage kinestetic or third person demonstration.
Cite as: Yiming Ding*, Carlos Florensa *, Mariano Phielipp, Pieter Abbeel. Goal-Conditioned Imiation Learning. Advances in Neural Information Processing Systems, NeurIPS 2019.
Sub-policy Adaptation for Hierarchical Reinforcement Learning
Hierarchical Reinforcement Learning is a promising approach to long-horizon decision-making problems with sparse rewards. Unfortunately, most methods still decouple the lower-level skill acquisition process and the training of a higher level that controls the skills in a new task. Treating the skills as fixed can lead to significant sub-optimality in the transfer setting. In this work, we propose a novel algorithm to discover a set of skills, and continuously adapt them along with the higher level even when training on a new task. Our main contributions are two-fold. First, we derive a new hierarchical policy gradient, as well as an unbiased latent-dependent baseline. We introduce Hierarchical Proximal Policy Optimization (HiPPO), an on-policy method to efficiently train all levels of the hierarchy simultaneously. Second, we propose a method of training time-abstractions that improves the robustness of the obtained skills to environment changes
Cite as: Alexander C. Li* , Carlos Florensa*, Ignasi Clavera, Pieter Abbeel. Sub-policy Adaptation for Hierarchical Reinforcement Learning. International Conference on Learning Representations ICLR 2020.
Self-supervised Learning of Image Embedding for Continuous Control
Operating directly from raw high dimensional sensory inputs like images is still a challenge for robotic control. Recently, Reinforcement Learning methods have been proposed to solve specific tasks end-to-end, from pixels to torques. However, these approaches still require the access to a specified reward which may require specialized instrumentation of the environment. Furthermore, the obtained policy and representations tend to be task specific and may not transfer well. In this work we investigate completely unsupervised learning of a general image embedding and control primitives, based on finding the shortest time to reach any state. We also introduce a new structure for the state-action value function that builds a connection between model-free and model-based methods, and improves the performance of the learning algorithm. We experimentally demonstrate these findings in three simulated robotic tasks.
Cite as: Carlos Florensa, Jonas Degrave, Nicolas Heess, Jost Tobias Springenberg, Martin Riedmiller. Self-supervised Learning of Image Embedding for Continuous Control. Contributed Talk in I2C workshop at Advances in Neural Information Processing Systems (NIPS) 2018.
Adaptive Variance for Changing Sparse-Reward Environments
Robots that are trained to perform a task in a fixed environment often fail when facing unexpected changes to the environment due to a lack of exploration. We propose a principled way to adapt the policy for better exploration in changing sparse-reward environments. Unlike previous works which explicitly model environmental changes, we analyze the relationship between the value function and the optimal exploration for a Gaussian-parameterized policy and show that our theory leads to an effective strategy for adjusting the variance of the policy, enabling fast adapt to changes in a variety of sparse-reward environments.
Cite as: Xingyu Lin, Pengsheng Guo, Carlos Florensa, David Held. Adaptive Variance for Changing Sparse-Reward Environments. International Conference on Robotics and Automation (ICRA) 2019
Automatic Goal Generation for Reinforcement Learning Agents
Reinforcement learning (RL) is a powerful technique to train an agent to perform a task; however, an agent that is trained using RL is only capable of achieving the single task that is specified via its reward function. Such an approach does not scale well to settings in which an agent needs to perform a diverse set of tasks, such as navigating to varying positions in a room or moving objects to varying locations. Instead, we propose a method that allows an agent to automatically discover the range of tasks that it is capable of performing in its environment. We use a generator network to propose tasks for the agent to try to accomplish, each task being specified as reaching a certain parametrized subset of the state-space. The generator network is optimized using adversarial training to produce tasks that are always at the appropriate level of difficulty for the agent, thus automatically producing a curriculum. We show that, by using this framework, an agent can efficiently and automatically learn to perform a wide set of tasks without requiring any prior knowledge of its environment, even when only sparse rewards are available.
Cite as: Carlos Florensa*, David Held*, Xinyang Geng*, Pieter Abbeel. Automatic Goal Generation for Reinforcement Learning Agents. In International Conference in Machine Learning (ICML) 2018.
Reverse Curriculum Generation for Reinforcement Learning
Many relevant tasks require an agent to reach a certain state, or to manipulate objects into a desired configuration. For example, we might want a robot to align and assemble a gear onto an axle or insert and turn a key in a lock. These tasks present considerable difficulties for reinforcement learning approaches, since the natural reward function for such goal-oriented tasks is sparse and prohibitive amounts of exploration are required to reach the goal and receive a learning signal. Past approaches tackle these problems by manually designing a task-specific reward shaping function to help guide the learning. Instead, we propose a method to learn these tasks without requiring any prior task knowledge other than obtaining a single state in which the task is achieved. The robot is trained in "reverse", gradually learning to reach the goal from a set of starting positions increasingly far from the goal. Our method automatically generates a curriculum of starting positions that adapts to the agent's performance, leading to efficient training on such tasks. We demonstrate our approach on difficult simulated fine-grained manipulation problems, not solvable by state-of-the-art reinforcement learning methods.
Cite as: Carlos Florensa, David Held, Markus Wulfmeier, Michael Zhang, Pieter Abbeel. Reverse Curriculum Generation for Reinforcement Learning. In Conference on Robot Learning (CoRL) 2017.
Stochastic Neural Networks for Hierarchical Reinforcement Learning
Deep reinforcement learning has achieved many impressive results in recent years. However, tasks with sparse rewards or long horizons continue to pose significant challenges. To tackle these important problems, we propose a general framework that first learns useful skills in a pre-training environment, and then leverages the acquired skills for learning faster in downstream tasks. Our approach brings together some of the strengths of intrinsic motivation and hierarchical methods: the learning of useful skill is guided by a single proxy reward, the design of which requires very minimal domain knowledge about the downstream tasks. Then a high-level policy is trained on top of these skills, providing a significant improvement of the exploration and allowing to tackle sparse rewards in the downstream tasks. To efficiently pre-train a large span of skills, we use Stochastic Neural Networks combined with an information-theoretic regularizer. Our experiments show that this combination is effective in learning a wide span of interpretable skills in a sample-efficient way, and can significantly boost the learning performance uniformly across a wide range of downstream tasks.
Cite as: Florensa, Carlos; Duan, Yan; Abbeel, Pieter. Stochastic Neural Networks for Hierarchical Reinforcement Learning. In International Conference on Learning Representations (ICLR) 2017.
Capacity planning with competitive decision-makers: Trilevel MILP formulation, degeneracy, and solution approaches
Capacity planning addresses the decision problem of an industrial producer investing on infrastructure to satisfy future demand with the highest profit. Traditional models neglect the rational behavior of some external decision-makers by assuming either static competition or captive markets. We propose a mathematical programing formulation with three levels of decision-makers to capture the dynamics of duopolistic markets. The trilevel model is transformed into a bilevel optimization problem with mixed-integer variables in both levels by replacing the third-level linear program with its optimality conditions. We introduce new definitions required for the analysis of degeneracy in multilevel models, and develop two novel algorithms to solve these challenging problems. Each algorithm is shown to converge to a different type of degenerate solution. The computational experiments for capacity expansion in industrial gas markets show that no algorithm is strictly superior in terms of performance.
Cite as: Carlos Florensa, Pablo Garcia-Herreros, Pratik Misra, Erdem Arslan, Sanjay Mehta, Ignacio E. Grossmann. Capacity planning with competitive decision-makers: Trilevel MILP formulation, degeneracy, and solution approaches. European Journal of Operations Research 2017.
“The magic of light!” - An entertaining optics and photonics awareness program
Illusionism provides a surprising and unforgettable way of explaining photonics to a wide audience. Imagine grabbing with your own hand an egg-sized photon with the same incredible properties as in a quantum computer! And what about touching the light beam which detects and removes diseased cells like in cutting edge medical prototypes? The art of magic allows promoting photonics, exploring advanced subjects in an understandable and palpable fashion that strongly inspires all ages.
Cite as: Carlos Florensa, Miriam Martí, S. Chaitanya Kumar, Silvia Carrasco. “The magic of light!” - An entertaining optics and photonics awareness program. Education and Training in Optics and Photonics, 2013.