Research seminars

Master in Robotics, Graphics and Computer Vision - Universidad de Zaragoza

Previous seminars from 2020-21

Capturing the First Image of a Black Hole & Designing the Future of Black Hole Imaging

Katie Bouman - California Institute of Technology, US

5th OCTOBER 18.00h

@Youtube Live Stream: https://youtu.be/fd59MdDfmZI

Abstract: This talk will present the methods and procedures used to produce the first image of a black hole from the Event Horizon Telescope, as well as discuss future developments for black hole imaging. It had been theorized for decades that a black hole would leave a "shadow" on a background of hot gas. Taking a picture of this black hole shadow would help to address a number of important scientific questions, both on the nature of black holes and the validity of general relativity. Unfortunately, due to its small size, traditional imaging approaches require an Earth-sized radio telescope. In this talk, I discuss techniques the Event Horizon Telescope Collaboration has developed to photograph a black hole using the Event Horizon Telescope, a network of telescopes scattered across the globe. Imaging a black hole’s structure with this computational telescope required us to reconstruct images from sparse measurements, heavily corrupted by atmospheric error. The talk will also discuss future developments, including new imaging techniques and how we are developing machine learning methods to help design future telescope arrays.

Bio: Katherine L. (Katie) Bouman is a Rosenberg Scholar and an assistant professor in the Computing and Mathematical Sciences and Electrical Engineering Department at the California Institute of Technology. Before joining Caltech, she was a postdoctoral fellow in the Harvard-Smithsonian Center for Astrophysics. She received her Ph.D. in the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT in EECS. Before coming to MIT, she received her bachelor's degree in Electrical Engineering from the University of Michigan. The focus of her research is on using emerging computational methods to push the boundaries of interdisciplinary imaging.

An Introduction to Physically Based Differentiable Rendering

Wenzel Jakob - École Polytechnique Fédérale de Lausanne (EPFL), CH

21st OCTOBER 13.00h

@Youtube Live Stream: https://youtu.be/EKavxrX5nWE

Abstract: Progress on differentiable rendering over the last two years has been remarkable, making these methods a serious contender for solving truly hard inverse problems in computer graphics, computer vision, robotics, and beyond. In this talk, I will give an overview of physically based differentiable rendering and its fascinating applications, as well as future challenges in this rapidly evolving field.

Bio: Wenzel Jakob is an assistant professor at EPFL's School of Computer and Communication Sciences, and is leading the Realistic Graphics Lab (https://rgl.epfl.ch/). His research interests revolve around inverse graphics, material appearance modeling and physically based rendering algorithms. Wenzel is the recipient of the ACM SIGGRAPH Significant Researcher award the Eurographics Young Researcher Award, and an ERC Starting Grant. He is also the lead developer of the Mitsuba renderer, a research-oriented rendering system, and one of the authors of the third edition of "Physically Based Rendering: From Theory To Implementation". (http://pbrt.org/)

Autonomous, Agile Micro Drones: Perception, Learning, and Control

Davide Scaramuzza - Professor, University of Zurich

28th OCTOBER @ 13.00h

@Youtube Live Stream: https://youtu.be/c2gLzI_bu1I

Abstract: Autonomous quadrotors will soon play a major role in search-and-rescue, delivery, and inspection missions, where a fast response is crucial. However, their speed and maneuverability are still far from those of birds and human pilots. High speed is particularly important: since drone battery life is usually limited to 20-30 minutes, drones need to fly faster to cover longer distances. However, to do so, they need faster sensors and algorithms. Human pilots take years to learn the skills to navigate drones. What does it take to make drones navigate as good or even better than human pilots? Autonomous, agile navigation through unknown, GPS-denied environments poses several challenges for robotics research in terms of perception, planning, learning, and control. In this talk, I will show how the combination of both model-based and machine learning methods united with the power of new, low-latency sensors, such as event cameras, can allow drones to achieve unprecedented speed and robustness by relying solely on onboard computing.

Bio: Davide Scaramuzza (Italian) is Professor of Robotics and Perception at both departments of Informatics (University of Zurich) and Neuroinformatics (University of Zurich and ETH Zurich), where he directs the Robotics and Perception Group. His research lies at the intersection of robotics, computer vision, and machine learning, using both standard cameras and event cameras, and is aimed at enabling autonomous, agile, navigation of micro drones in search and rescue applications. For his research contributions to autonomous, vision-based, drone navigation and event cameras, he won prestigious awards, such as an ERC Consolidator Grant, the IEEE Robotics and Automation Society Early Career Award, an SNSF-ERC Starting Grant, a Google Research Award, the KUKA Innovation Award, two Qualcomm Innovation Fellowships, the European Young Research Award, the Misha Mahowald Neuromorphic Engineering Award.

DIVE: Detecting visual pathologies with artificial intelligence

Marta Ortín - DIVE Medical

10th NOVEMBER @ 13.00h

@Youtube Live Stream: https://youtu.be/llI58KnBy3k

Abstract: There are 2.2 billion people in the world with some visual pathology, including an estimated 810 million children. Many of them, especially in developing countries, will never be diagnosed. And here's an striking fact: 70% of the cases could have been prevented or treated if detected in time. However, current visual exploration techniques for non-collaborative patients, such as babies or people with neurocognitive disorders, are very subjective and inaccurate, requiring a lot of time and experience from the ophthalmologist. To help solve that problem, we have developed a novel technology based on eye tracking and deep learning that is able to provide a detailed, efficient and objective exploration of the visual function to enable early diagnosis. Our deep learning algorithms analyze gaze patterns to reveal the possible presence of visual pathologies, even in babies as young as six months. After seven years of multidisciplinary research by engineers, ophthalmologists, opticians, and psicologists, this has led to the creation of DIVE Medical, a rising startup based in Zaragoza. Partnering with Huawei, we have already brought our technology to five countries in three continents so far, successfully diagnosing more than 3.000 patients.

Bio: Marta Ortín is co-founder of the startup DIVE Medical, and postdoctoral researcher in the Graphics and Imaging Lab at the Universidad de Zaragoza. She completed her PhD at the Universidad de Zaragoza, and has performed internships at Intel Mobile Communications (Munich, Germany), the Korean Advanced Institute of Science and Technology (Daejeon, South Korea), and the University of Ferrara (Ferrara, Italy). Her research focuses on the application of eye tracking and artificial intelligence to the detection of visual pathologies.

PhD Defense - Visual SLAM in Dynamic Environments

Berta Bescós Torcal - Universidad de Zaragoza

18th NOVEMBER @ 13.00h

This event will be streamed in the General Link from the Master program.

Abstract: This seminar will be a live streaming of Berta's PhD defense. Her thesis is entitled Visual SLAM in Dynamic Environments. The world surrounding us is complex and ever changing, which makes robots struggle to have a good understanding of the scene. This PhD thesis presents solutions to deal with dynamic objects in Simultaneous Localization and Mapping problems, for a better scene understanding with both geometry and semantics.

Bio: Berta Bescós is finishing her PhD. in Computer Vision and Robotics at the University of Zaragoza, advised by Dr. José Neira. During her PhD she has spent 5 months working with Cesar Cadena at ASL in ETH Zurich. Right after graduating, she will join Facebook at their Zurich office. Her main research interests lie in the intersection between perception and learning for robotics.

SEDDI: Fabrics Digitization for the Fashion Industry

Elena Garcés - SEDDI, Madrid

24th NOVEMBER @ 13.00h

Youtube Livestream: https://youtu.be/hC4ycI243GA

Abstract: The Fashion Industry is nowadays one of the most inefficient and polluting businesses. Many items of cloth are simply trashed, either in the design process, because they are never sold, or because they are returned by unsatisfied customers. Fully virtual workflows during design and shoppers' try-on experiences are required to change this situation. At SEDDI, we work towards this goal at all stages of the garment creation pipeline. In this talk, I will focus on the challenges of digitizing textile materials, including methods and devices to capture the physical properties of the fabrics, as well as the models used to simulate and render the fabrics in the virtual world.

Bio: Elena Garcés is a Senior Researcher and Tech Manager at SEDDI (Madrid) where she leads the Optical Capture and Rendering teams. Before that, she was a Juan de la Cierva Fellow at MSLab of the Universidad Rey Juan Carlos (Madrid), and a Postdoctoral Researcher at Technicolor R&D (France). She completed her Ph.D. in 2016 in the Graphics and Imaging Lab of the Universidad de Zaragoza, advised by Prof. Diego Gutierrez. Her research interests span the fields of computer graphics, computer vision, and applied machine learning. She is currently working on digitizing fabric materials for the Fashion Industry, as well as machine learning-based simulations of avatars for Virtual-Try-On experiences.

Computer Vision at Facebook Reality Labs

Mariano Jaimez - Facebook Reality Labs (FRL), Zurich

25th NOVEMBER @ 13.00h

Abstract: Facebook Reality Labs (FRL) brings together a team of researchers, developers, and engineers to build the future of connection within virtual and augmented reality. That requires the development of technologies to perceive and interpret the world around us either to localize ourselves in it, to interact with it or to enhance it. Computer vision is a key component of those technologies, and in this talk I am going to introduce our work on SLAM, 3D reconstruction and face / hand tracking and show how it shapes the present and the future of AR & VR products.

Bio: Mariano Jaimez is a computer vision engineer at Facebook Reality Labs in Zurich. Previously he obtained his Ph.D. degree with the Machine Perception and Intelligent Robotics group, University of Málaga and with the Computer Vision group, Technical University of Munich, Germany. His work focused on visual odometry, scene flow estimation, and 3-D reconstruction and his research interests include the potential applications of range-sensing technologies in the fields of robotics, computer vision, virtual/augmented reality, and autonomous driving.

Certifiable Perception for Robots and Autonomous Vehicles: From Robust Algorithms to Robust Systems

Luca Carlone. MIT, USA.

1st DECEMBER @ 18.00h

Youtube Livestream: https://youtu.be/KXJis4iWaG4

Abstract: Spatial perception —the robot’s ability to sense and understand the surrounding environment— is a key enabler for autonomous systems operating in complex environments, including self-driving cars and unmanned aerial vehicles. Recent advances in perception algorithms and systems have enabled robots to detect objects and create large-scale maps of an unknown environment, which are crucial capabilities for navigation, manipulation, and human-robot interaction. Despite these advances, researchers and practitioners are well aware of the brittleness of existing perception systems, and a large gap still separates robot and human perception.

This talk discusses two efforts targeted at bridging this gap. The first focuses on robustness. I present recent advances in the design of certifiable perception algorithms that are robust to extreme amounts of noise and outliers and afford performance guarantees. I present fast certifiable algorithms for object pose estimation: our algorithms are “hard to break” (e.g., are robust to 99% outliers) and succeed in localizing objects where an average human would fail. Moreover, they come with a “contract” that guarantees their input-output performance. The second effort targets high-level understanding. While humans are able to quickly grasp both geometric, semantic, and physical aspects of a scene, high-level scene understanding remains a challenge for robotics. I present our work on real-time metric-semantic understanding and 3D Dynamic Scene Graphs. I introduce the first generation of Spatial Perception Engines, that extend the traditional notions of mapping and SLAM, and allow a robot to build a “mental model” of the environment, including spatial concepts (e.g., humans, objects, rooms, buildings) and their relations at multiple levels of abstraction. Certifiable algorithms and real-time high-level understanding are key enablers for the next generation of autonomous systems, that are trustworthy, understand and execute high-level human instructions, and operate in large dynamic environments and over and extended period of time.

Bio: Luca Carlone is the Leonardo Career Development Assistant Professor in the Department of Aeronautics and Astronautics at the Massachusetts Institute of Technology, and a Principal Investigator in the MIT Laboratory for Information & Decision Systems (LIDS) and the director of the MIT SPARK Lab. His research interests include nonlinear estimation, numerical and distributed optimization, and probabilistic inference, applied to sensing, perception, and decision-making in single and multi-robot systems. He is the recipient of several prestigious awards, such as the Best Paper Award in Robot Vision at ICRA’20, the 2017 Transactions on Robotics King-Sun Fu Memorial Best Paper Award or the RSS Early Career Award (2020).

PhD Defense - Indoor Scene Understanding using Non-Conventional Cameras

Clara Fernández Labrador - Universidad de Zaragoza

3rd DECEMBER @ 16.00h

This event will be run on a Teams videoconference. The link will provided to you by email/moodle.

YouTube Livestream: https://youtu.be/gGWTeZU_B7o

Abstract: Clara will describe her PhD research, on topics related to 3D scene understanding and unsupervised learning. Specifically, she has been working on 3D layout recovery and object detection in 360 images, also exploring new methods to deal with the big distortions these images entail. She has also worked towards automatically discovery of meaningful 3D keypoints from a collection of objects of a same category. How powerful can all of this get with unsupervised learning?

Bio: Clara Fernández is finishing her PhD. in Computer Vision supervised by Prof. Josechu Guerrero in the Robotics lab at the University of Zaragoza (Spain) and Prof. Cédric Demonceaux in the ImViA at the University of Burgundy (France). During her Ph.D, she also spent 7 months in the Computer Vision Laboratory at ETH Zurich and 3 months at Disney Research Studios. Right after graduating, she is joining the Media Technology Center at ETH Zurich as a postdoc. Her research interests lie in 3D scene understanding and unsupervised learning using non-conventional sensors, such as 360 cameras or 3D data.

Why Neural Rendering is getting more amazing every day!

Matthias Nießner - TU Munich, DE

9th DECEMBER @ 13.00h

Youtube Livestream: https://youtu.be/92dDNtlCzvc

Abstract: In this talk, I will present my research vision in how to create photo-realistic digital replica of the real world, and how to make holograms become a reality. Eventually, I would like to see photos and videos evolve to become interactive, holographic content indistinguishable from the real world. Imagine taking such 3D photos to share with friends, family, or social media; the ability to fully record historical moments for future generations; or to provide content for upcoming augmented and virtual reality applications. AI-based approaches, such as generative neural networks, are becoming more and more popular in this context since they have the potential to transform existing image synthesis pipelines. I will specifically talk about an avenue towards neural rendering where we can retain the full control of a traditional graphics pipeline but at the same time exploit modern capabilities of deep learning, such has handling the imperfections of content from commodity 3D scans.

While the capture and photo-realistic synthesis of imagery opens up unbelievable possibilities for applications ranging from entertainment to communication industries, there are also important ethical considerations that must be kept in mind. Specifically, in the content of fabricated news (e.g., fakenews), it is critical to highlight and understand digitally-manipulated content. I believe that media forensics plays an important role in this area, both from an academic standpoint to better understand image and video manipulation, but even more importantly from a societal standpoint to create and raise awareness around the possibilities and moreover, to highlight potential avenues and solutions regarding trust of digital content.

Bio: Matthias Nießner is a Professor at the Technical University of Munich where he leads the Visual Computing Lab. Before, he was a Visiting Assistant Professor at Stanford University. Prof. Nießner’s research lies at the intersection of computer vision, graphics, and machine learning, where he is particularly interested in cutting-edge techniques for 3D reconstruction, semantic 3D scene understanding, video editing, and AI-driven video synthesis. In total, he has published over 70 academic publications, including 22 papers at the prestigious ACM Transactions on Graphics (SIGGRAPH / SIGGRAPH Asia) journal and 26 works at the leading vision conferences (CVPR, ECCV, ICCV); several of these works won best paper awards, including at SIGCHI’14, HPG’15, SPG’18, and the SIGGRAPH’16 Emerging Technologies Award for the best Live Demo.

Prof. Nießner’s work enjoys wide media coverage, with many articles featured in main-stream media including the New York Times, Wall Street Journal, Spiegel, MIT Technological Review, and many more, and his was work led to several TV appearances such as on Jimmy Kimmel Live, where Prof. Nießner demonstrated the popular Face2Face technique; Prof. Nießner’s academic Youtube channel currently has over 5 million views. For his work, Prof. Nießner received several awards: he is a TUM-IAS Rudolph Moessbauer Fellow (2017 – ongoing), he won the Google Faculty Award for Machine Perception (2017), the Nvidia Professor Partnership Award (2018), as well as the prestigious ERC Starting Grant 2018 which comes with 1.500.000 Euro in research funding; in 2019, he received the Eurographics Young Researcher Award honoring the best upcoming graphics researcher in Europe. In addition to his academic impact, Prof. Nießner is a co-founder and director of Synthesia Inc., a brand-new startup backed by Marc Cuban, whose aim is to empower storytellers with cutting-edge AI-driven video synthesis.

Deep Learning computing

Jorge Albericio - Cerebras Systems - Los Altos, California, USA

10th DECEMBER @ 18.00h

Abstract: Deep learning is being used today to achieve state of the art results in very different domains, from machine translation to autonomous driving. New neural models improving task performance carry most of the time an increase in computational needs, which, with the end of traditional fabrication scaling, hardware can only meet through specialized architectures.

In this talk, Jorge will 1) describe the computational primitives involved in deep learning, 2) expose some of the characteristics present in the data processed by neural models which hardware can exploit, and 3) present some examples of specialized hardware architectures targeting modern neural models.

Bio: Jorge Albericio works on sparsity performance and next-gen architecture at Cerebras Systems. Prior to that, at NVIDIA, he participated in the conception and development of the sparsity support in the Ampere architecture. He has a PhD in systems engineering and computing from the University of Zaragoza. He was a postdoctoral fellow at the University of Toronto from 2013 to 2016, where he worked on branch prediction, approximate computing, and hardware accelerators for machine learning.

Spatial AI for mobile robots

Stefan Leutenegger - Imperial College, London. UK

15th DECEMBER @ 13.00h

Abstract: Despite huge advances in Spatial AI, regarding localisation, dense mapping and scene understanding fuelled by the advent of Deep Learning and powerful processors, robots still have a robustness problem: real-world applicability is limited to restricted tasks and restricted environments. Different paradigms have emerged as to how much the perception-action cycle of a mobile robot should remain somewhat hand-engineered and modular, or at the other extreme, end-to-end learned with rather black-box models, e.g. using Deep Reinforcement Learning from pixels to torques. In my talk, I will go through a couple of examples that sit in the middle. They leverage Deep Learning for sub-tasks in an otherwise modular and more classic approach. We explicitly estimate robot states in the form of e.g. position and orientation, as well as the environment, reconstructed to both geometrical accuracy, and decomposed into semantically meaningful entities, such as 3D objects that may even move. Importantly, the spatial representations need to be chosen for task-specific robust robotic interaction with the environment. In this context, I will be presenting some application examples to drone navigation and control, with an emphasis on accuracy, robustness, failure identification and recovery.

Bio: Stefan Leutenegger is a Senior Lecturer (Associate Professor) in Robotics in the Department of Computing at Imperial College London, where he leads the Smart Robotics Lab and furthermore supervises research undertaken by the Dyson Robotics Lab. He has also co-founded SLAMcore, a spin-out company aiming at commercialisation of localisation and mapping solutions for robots and drones. Stefan has received a BSc and MSc in Mechanical Engineering with a focus on Robotics and Aerospace Engineering from ETH Zurich, as well as a PhD on “Unmanned solar airplanes: design and algorithms for efficient and robust autonomous operation”, completed in 2014.

Decoding brain signals and improving human performance

Luis Montesano - BitBrain Technologies

17th FEBRUARY (12.50 TO 13.40H)

@Youtube Live Stream: https://youtu.be/CbRa6q517B4

Bio: Luis Montesano is an Associate Professor in Computer Science (on leave) and currently leads the R&D activities in BitBrain technologies, where he works to try to bring neurotechnology to your daily live and turn you into a cyborg ;-) !
He joined Bitbrain in July 2015 and he is responsible for the R&D activities in the company, management of R&D projects and technology forecasting. His research is in the areas of robotics, machine learning, neuroscience, neural engineering, brain-machine technology, human-computer interaction, cognitive and motor neurorehabilitation.

Abstract: Luis will discuss the work he has been leading at BitBrain working on decoding brain signals, and how they have proposed to use that to improve human performance in several tasks. He will also show impressive results from several of their projects!

NVIDIA-Merlin, an open beta framework for building large-scale deep learning recommender systems

Alberto Álvarez Aldea - NVIDIA

22th FEBRUARY (12.50 TO 13.40H)

Connect at the usual course Meet link.

Bio: Alberto Álvarez is a Software Engineer at Nvidia, working in GPU accelerated AI. He obtained his Master's degree in Computer Science at the University of Illinois at Urbana-Champaign, 2019, and his Bachelor's degree in Computer Engineering at the University of Zaragoza, 2017.

Abstract: Alberto will talk about his participation in the development of the Merlin framework at Nvidia. Merlin empowers data scientists, machine learning engineers, and researchers to build high-performing recommenders at scale. Merlin includes tools that democratize building deep learning recommenders by addressing common ETL, training, and inference challenges. Each stage of the Merlin pipeline is optimized to support hundreds of terabytes of data, all accessible through easy-to-use APIs.

Exploring dark and dangerous environments with legged robots

Maurice Fallon - Oxford

1st MARCH (12.50 TO 13.40H)

@Youtube Live Stream: https://youtu.be/hvHjkcgmza4

Bio: Dr. Maurice Fallon is a Royal Society University Research Fellow (~Assistant Professor in the UK system). He leads the Dynamic Robot Systems Group within Oxford Robotics Institute. His research is focused on probabilistic methods for localization and mapping with particular application to legged robots and sensor fusion. Maurice studied undergraduate electronic engineering at University College Dublin (Ireland) and received his PhD from the University of Cambridge in 2008. From 2008-2012 he was a PostDoc in the Marine Robotics Group of Prof. John Leonard before being the perception lead of MIT's DARPA Robotics Challenge team (2012-2015). His group's research is part of several cross-UK projects (ORCA, RAIN) as well as two EU H2020 projects (MEMMO, THING).

Abstract: Legged robots have great potential to access human-centric environments and to be able to carry out inspection and exploration missions. I'll explain some of the challenges of estimating the state of a legged robot (at up to 400Hz) and perceiving its environment in challenging including uncertain contact sensing, the demands of low latency and with impoverished external sensing. Motivated by the DARPA Subterranean Challenge, we will present experimental results from various trials in underground mines and caves showing how inertial, kinematic, visual and LIDAR sensing are leveraged in modern quadruped robots such as ANYmal and Spot to enable exploration. I will also discuss some of our group's work on motion planning and control for legged systems which uses trajectory optimization to generate dynamic locomotion.

The Transforming Power of Virtual Embodiment

Mel Slater - Universitat de Barcelona

8th MARCH (12.50 TO 13.40H)

@Youtube Live Stream: https://youtu.be/uNbQ7E2P1tE

Bio: Mel Slater is a Distinguished Investigator at the University of Barcelona, and co-Director of the Event Lab (Experimental Virtual Environments for Neuroscience and Technology). He was previously Professor of Virtual Environments at University College London in the Department of Computer Science. He has been involved in research in virtual reality since the early 1990s, and has been first supervisor of 40 PhDs in graphics and virtual reality since 1989. He held a European Research Council Advanced Grant TRAVERSE 2009-2015 and has now a second Advanced Grant MoTIVE 2018-2023. He is Field Editor of Frontiers in Virtual Reality, and Chief Editor of the Human Behaviour in Virtual Reality section. He is one of the Founders of the company Virtual Bodyworks S.L. His publications can be seen on http://publicationslist.org/melslater.

Abstract: In VR if it has been so programmed you will see a life-sized virtual body replacing your own when you look down towards yourself or into a virtual mirror. You are likely then to have the perceptual illusion that the virtual body is yours, even though you know for sure that it is not. This is referred to as a body ownership illusion. Here I will describe this illusion, and give examples of how this can be used for self-transformation. I will also describe current developments in our project MoTIVE (‘Moments in Time in Immersive Virtual Environments’).

RAIN in Manchester?

Mikel Lujan - University of Manchester

12th MARCH (12.50 TO 13.40H)

@Meet: https://meet.google.com/hcs-icnb-npz

Bio: Mikel Luján is the ARM/Royal Academy of Engineering Research Chair in Computer Systems at the University of Manchester since 2019. He is also the director of the ARM Centre of Excellence at the University of Manchester and held a Royal Society Research Fellowship since 2008 until 2017. He is the Chief Scientific Advisor of the spin off company (Amanieu Systems) which is commercialising research on Dynamic Binary Translation for modern ARM processors. Mikel has authored more than 150 refereed papers on a range of topics from parallel programing to many-core architectures, including machine learning and FPGAs. In the last five years he has published papers in ACM TOPLAS, IEEE TOC, ICRA, HPCA, PLDI (distinguished paper award), FCCM, VEE, and ISPASS (best paper award). In other words, he is investigating low power systems addressing the full stack.

Abstract: In the last decade computer systems have experienced a major change in how they are designed, as they have encountered fundamental thermal and power consumption constraints. Moving forward, these constraints are forcing computer architectures (System-on-Chips) to include more power efficient, but less general, hardware accelerators.

It is already common to find heterogeneous SoCs (DSPs, GPUs) on battery operated devices, where heterogeneity provides a means to optimise specific tasks for energy and performance. Current “hot” examples of accelerators are dedicated Neural Processing Units for Deep Neural Network Inference and Training, and FPGAs in data centers.

However, moving beyond these well-funded and established accelerators, what do we need to make usable ubiquitous accelerators? In this talk we will share our lessons at the intersection of Robotics, Machine Learning and designing future heterogenous SoCs harnessing FPGAs. The talk will cover mainly research financed by the PAMELA (http://apt.cs.manchester.ac.uk/projects/PAMELA/), and the RAIN hub projects (https://rainhub.org.uk/).

Photorealistic Reconstruction of Landmarks and People using Neural Scene Representations

Ricardo Martín-Brualla - Google

16th March (17.00 TO 18.30h)

Bio: Ricardo Martin-Brualla is a researcher at Google Seattle working on the future of communication. He completed his PhD in 2016 at the University of Washington advised by Prof. Steve Seitz. Before that, he earned dual degrees in Computer Engineering and Mathematics at the Universitat Politècnica de Catalunya. His research interests lie at the intersection of 3D computer vision and graphics.

Abstract: Reconstructing scenes to synthesize novel views is a long standing problem in Computer Vision and Graphics. Recently, neural scene representations have shown novel view synthesis results of unprecedented quality, like the ones of Neural Radiance Fields (NeRF), which use the weights of a multi-layer perceptron to model the volumetric density and color of a scene. In this talk, I will present two works that extend such representations to handle real-world data of scenes and deformable objects that one may capture with a smartphone. First, I will introduce a method to reconstruct landmarks using uncontrolled internet photo collections. While NeRF works well on images of static subjects captured under controlled settings, it is incapable of modeling many ubiquitous, real-world phenomena, such as variable illumination or transient occluders. Our model - NeRF-W - is a series of extensions to NeRF to address these issues, thereby allowing for accurate reconstructions from in-the-wild photo collections. Next, I will present a method capable of photorealistically reconstructing a non-rigidly deforming scene using photos/videos captured casually from mobile phones. Our approach augments NeRF by optimizing an additional continuous volumetric deformation field that warps each observed point into a canonical template NeRF. We observe that these NeRF-like deformation fields are prone to local minima, and propose a coarse-to-fine and elastic regularizations that allow for more robust optimization. Our model - D-NeRF - can turn casually captured selfie photos/videos into deformable NeRF models that allow for photorealistic renderings of the subject from arbitrary viewpoints, which we dub "nerfies".

The Moon Camera and some research spin-offs from it.

Bill Freeman - MIT & Google

22nd March (15.00 TO 16.30h)

@Youtube Live Stream: https://youtu.be/MLsXT20Sr64

Bio: William T. Freeman is the Thomas and Gerd Perkins Professor of Electrical Engineering and Computer Science (EECS) at MIT, and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL) there. Since 2015, he has also been a research manager in Google Research in Cambridge, MA. He received outstanding paper awards at computer vision or machine learning conferences in 1997, 2006, 2009, 2012 and 2019, and test-of-time awards for papers from 1990, 1995 and 2005. He shared the 2020 Breakthrough Prize in Physics for a consulting role with the Event Horizon Telescope collaboration, which reconstructed the first image of a black hole. In 2019, he received the PAMI Distinguished Researcher Award, the highest award in computer vision, and was elected to the U.S. National Academy of Engineering in 2021

Abstract: The goal is to take a picture of the Earth from space, using ground-based observations of the Moon from a backyard telescope. I'll describe the project, why it's hard, and why it would be a useful thing to do. I'll tell why each of three different approaches (exploiting diffuse reflection, cast shadows, and sunlight phasor fields) hasn't worked, and will describe the computational imaging spin-offs that have resulted from each approach. We haven't given up, and I'll describe our plans for the next half-year.

Color geometry and applications

Jose I. Echevarría - Adobe Research

7th April (17.00 TO 18.30h)

Abstract: Since sir Isaac Newton connected the colors of both ends of the visible color spectrum to form a circle back in 1704, intriguing geometric relationships between colors became apparent. Over the following three centuries, different color spaces with corresponding geometries have been proposed, driven by different combinations of physical and/or perceptual properties. In this talk we’ll cover recent research leveraging the specifics of some of those spaces to develop novel intuitive creative tools for image editing, graphics design, accessibility, or digital painting.

Bio: Jose Echevarria is a Senior Research Scientist at Adobe Research working at the intersection of computer graphics, computer vision and HCI. He focuses on developing novel creative tools for professional and novice users alike, turning artistic knowledge and human perception into practical solutions that people can use to explore their creativity, while developing their visual literacy. Some of his work has shipped in Adobe Photoshop, Illustrator, Character Animator or Color. His research has been presented at international venues including SIGGRAPH, CVPR, EUROGRAPHICS, ECCV, CHI or ACL; and he has filed 35+ U.S. and international patents. Jose received his PhD, MSc and BSc from Universidad de Zaragoza (Spain) in the Graphics and Imaging Lab.

Why learn something you already know?

Jaakko Lehtinen - Aalto University & Nvidia

12th April (13.00 TO 14.30h)

@Youtube Live Stream: https://youtu.be/4nQdmilBmKg


Abstract: While computer graphics has many faces, a central one is the fact that it enables creation of photorealistic pictures by simulating light propagation, motion, shape, appearance, and so on. In this talk, I’ll argue that this ability puts graphics research in a unique position to make fundamental contributions to machine learning and AI, while solving its own longstanding problems. The majority of modern high-performing machine learning models are not particularly interpretable; you cannot, say, interrogate an image-generating Generative Adversarial Network (GAN) to truly tease apart shape, appearance, lighting, and motion, or directly instruct an image classifier to pay attention to shape instead of texture. Yet, reasoning in such terms is the bread and butter of graphics algorithms! I argue that tightly combining the power of modern machine learning models with sophisticated graphics simulators will enable us to push the learning beyond pixels, into the physically meaningful, interpretable constituents of the world that are all tied together by the fact they come together under well-understood physical processes to form pictures. Of course, such “simulator-based inference” or “analysis by synthesis” is seeing an increasing interest in the research community, but I’ll try to convince you that what we’re seeing at the moment is just a small sample of things to come.

Bio: Jaakko Lehtinen is a tenured Associate Professor at Aalto University, and a principal research scientist at NVIDIA Research. He works on computer graphics, computer vision, and machine learning, with particular interests in generative modelling, realistic image synthesis, and appearance acquisition and reproduction. Prior to his current positions, he was postdoctoral researcher at MIT advised by Frédo Durand. Before his research career, he was a graphics programmer at the game developer Remedy Entertainment, and contributed significantly to the graphics technology behind the worldwide blockbuster hit games Max Payne (2001), Max Payne 2 (2003), and Alan Wake (2009). He is the recipient of an ERC Consolidator Grant.

Open challenges on Spatial Computing. When the user enters the content.

Mar Gonzalez-Franco - Microsoft Research, Redmond, USA.

30th APR (18.00 TO 19.30h)

This event will be streamed in the General course link (shown at the Master program calendar)

Abstract: We are moving from having the content inside the screen to having the user inside the content. This paradigm shift together with the increasing number of connected and digital content and AI, opens new avenues of interaction: new needs for devices, better auditory affordances, new forms of representation (in occasions via avatars), and new ways to move around the digital content.

Bio: Dr. Mar Gonzalez-Franco is a Principal Researcher in the EPIC (Extended Perception Interaction and Cognition) team at Microsoft Research. In her work Mar focuses on exploring human behavior and perception to help build better technologies on the wild. A part of her scientific output, her work has also transferred to products used in daily basis by many around the world, such as Together mode in Microsoft Teams and Microsoft Soundscape. Mar holds a BSc in Computer Science (URL, Barcelona) and MSc in Biomedical Engineering (Universitat de Barcelona and Tsinghua University). She earned her Ph.D. in Immersive Virtual Reality and Clinical Psychology under the supervision of Prof. Mel Slater at the EVENT-Lab, affiliated as a visiting student at the Massachusetts Institute of Technology, MediaLab. She completed her postdoctoral studies at University College London.

Taking Notes (or why I do not use your favorite text editor)

Xavier Llora - Principal Engineer, Google, Mountain View, USA.

3rd MAY, (17.00 TO 18h)

This event will be streamed in the General course link (shown at the Master program calendar)

Abstract: Xavier’s talk will explore the importance of note taking. Notes are critical to any endeavor involving converting large volumes of information into actionable knowledge. The concepts covered in this talk are applicable from graduate research work to complex system design endeavors. We will cover the importance of idea identification and linking and how those are recurrent concepts across a variety of disciplines. We will pay special attention to computer science, artificial intelligence and machine learning processes and techniques that can help effectively maximize knowledge and insides. The talk will conclude reviewing some tools that may help you explore some of the concepts and flows explored.

Bio: Dr. Xavier Llora is currently a principal engineer at Google where he works on fighting spam and abuse across Google properties.
Xavier got his computer science degree and PhD on Genetics-Based Machine Learning at University Ramon Llull, Barcelona. He spent 2 years as postdoctoral researcher at the University of Illinois at Urbana-Champaign (UIUC), later becoming a research assistant professor at the National Center for Supercomputing Applications, also at UIUC. His research covers machine learning, evolutionary computation and large-scale distributed systems. He has published over 100 papers, articles and book chapters on data and text mining, social network analysis, bioinformatics and large-scale supercomputing, among others, and holds several patents on related topics.

Learning image representation with self-supervised learning: From pretext tasks to contrastive methods and BYOL

Florian Strub - Research Scientist @ Deep Mind, Google. Paris

7th MAY (12.50 TO 14.00h)

@Youtube Live Stream: https://youtu.be/Qasn3ippZWc

Abstract: I will introduce self-supervised methods during this presentation and why they are a promising research direction in large-scale machine learning.

I will first give a quick overview of the motivation and past approaches before focusing on a new simple but surprisingly effective method developed at DM: Bootstrap Your Own Latent (BYOL). There, we will highlight the classic self-supervised learning evaluation protocol and give some intuitions while debunking a few ideas.

Finally, I will briefly explain how (1) BYOL can be extended to new modalities such as video or graph (2) (if enough time) how BYOL was originally an RL method, and emerged through the collaboration of multiple ML communities.

Bio: Dr. Florian Strub is a research scientist at DeepMind. He did his Ph.D. at the University of Lille, in the Inria SequeL team, advised by Prof. Olivier Pietquin and Jérémie Mary in collaboration with Mila. He tries his best to interleave ideas from computer vision, natural language, and reinforcement learning to design new research settings and algorithms.

In his research patchwork, he co-created GuessWhat?! a visual dialogue dataset, explored multimodal and self-supervised training procedures (FilM, BYOL), collaborated on various RL projects and he is currently (co-)working on language games.

Automating (active) machine learning

Gustavo Malkomes - SigOpt/Intel and Univ. of Washington Saint Louis, USA.

17th MAY (17.00 TO 18.15h)

@Youtube Live Stream: https://youtu.be/r1YcN7sM5NI

Abstract: In many problems in science, technology, and engineering, unlabeled data is abundant but acquiring labeled observations is expensive -- it requires a human annotator, a costly laboratory experiment, or a time-consuming computer simulation. Active learning is a machine learning paradigm designed to minimize the cost of obtaining labeled data by carefully selecting which new data should be gathered next. However, excessive machine learning expertise is often required to apply these techniques in their current form effectively. In this talk, I'll show solutions that further automate active learning. Specifically, we focus on novel automated model selection and Bayesian optimization techniques. Our contributions are Bayesian active learning algorithms that can be applied to automated audiometry tests, drug discovery, and material science.

Bio: Gustavo Malkomes is a Research Engineer at SigOpt/Intel, and an adjunct instructor at Washington University. He works with Bayesian optimization, automating machine learning and decision-making under uncertainty. Before his current positions, Gustavo completed a Ph.D. in Computer Science at Washington University in St Louis, under Professor Roman Garnett's supervision. His work was given the Turner Dissertation Award for best Computer Science & Engineering doctoral dissertation. Gustavo has also received an MSc and BS in Computer Science from the Federal University of Ceará in Brazil.

A Journey through Autonomy

Anna Petrovskaya - Amazon, USA.

19th MAY (18.00 TO 19.15h)

This event will be streamed in the General course link

Abstract: Autonomy has developed by leaps and bounds during the last two decades. The speaker will share her journey through these times, the technologies she built, the teams she met, and the companies she got to work for. This talk will cover highly detailed city-scale mapping, sub-cm accurate localization, as well as shape-aware detection and tracking of dynamic obstacles. Further, it will consider methods for dramatically increasing safety of perception in self-driving applications. The talk will also cover a few examples where these technologies can be extended beyond the scope of self-driving to other sensors and applications. Given the speaker’s background, the talk will also include some insights and practical advice on building startups, exiting, and working for big tech companies.

Bio: Currently a principal scientist at Amazon, where she is building autonomous delivery robots. Dr. Anna Petrovskaya is a scientist and entrepreneur with decades of experience in Autonomy, Robotics, and AI. Prior to Amazon, Anna built a 3D mapping startup, which was acquired by MobilEye / Intel. In 2011, she received her Doctorate degree in Computer Science from Stanford University, where her research focused on Bayesian methods for Artificial Perception in robotic and consumer applications. She has developed new efficient algorithms for autonomous vehicles, mobile manipulation, and tactile object localization. Anna was part of the core team that built the Stanford autonomous car Junior, which was a precursor to the Google/Waymo autonomous car. She has served as an Associate Editor for International Conference on Robotics and Automation (ICRA) since 2011. Based on her expertise, Anna has been invited to co-author chapters for the Handbook of Intelligent Vehicles and the 2nd edition of the Handbook of Robotics. In 2012, Anna was named among the winners of the IEEE ITSS Best PhD Thesis Award.

Teaching Robots to See -- Challenges and developments in Robotic Vision

Margarita Chli - ETH Zurich

24th MAY (12.50 TO 13.40H)

@Youtube Live Stream: https://youtu.be/J_RCaptGb7E

Abstract: As vision plays a key role in how we interpret a situation, developing vision-based perception for robots promises to be a big step towards robotic intelligence. This talk will briefly discuss some of the biggest challenges we are faced with all the way from robust localization and mapping, to dense scene representation for path planning, and collaborative perception. With effective robot collaboration featuring as a key scientific challenge in the field, the talk will focus on this topic describing our recent progress in this area at the Vision for Robotics Lab of ETH Zurich (http://www.v4rl.ethz.ch).

Bio: Margarita Chli is Assistant Professor at ETH Zurich leading the Vision for Robotics Lab (V4RL). She is also the Vice Director of the Institute of Robotics and Intelligent Systems (IRIS) of ETH Zurich and an Honorary Fellow of the University of Edinburgh in the UK. Her research interest is in Computer Vision for Robotics, focusing on real-time perception for small aircraft, as some of the most challenging platforms for robotic perception. Some highlights of her career include the participation in the first vision-based autonomous flight of a small helicopter and the award of the biannual Zonta Prize in 2017 on the basis of my high impact contributions on the development of robotic vision. V4RL was featured in Reuters, while she was a speaker at the World Economic Forum in Davos in 2017 as part of ETH Zurich's 3-strong delegation of professors.