Research seminars

Master in Robotics, Graphics and Computer Vision - Universidad de Zaragoza

PAST seminars from 2021-22

Learning 3D Representations of the World from Images and Video

Lourdes Agapito - University College London (UCL). 

19th October - 13h - Online

Google Meet: https://meet.google.com/baz-jfno-dzm
Youtube Live: https://youtu.be/fcSC9eIitwk

Abstract: As humans we take the ability to perceive the dynamic world around us in three dimensions for granted. From an early age we can grasp an object by adapting our fingers to its 3D shape; understand our mother’s feelings by interpreting her facial expressions; or effortlessly navigate through a busy street. All these tasks require some internal 3D representation of shape, deformations, and motion. Building algorithms that can emulate this level of human 3D perception, using as input single images or video sequences taken with a consumer camera, has proved to be an extremely hard task. Machine learning solutions have faced the challenge of the scarcity of 3D annotations, encouraging important advances in weak and self-supervision. In this talk I will describe progress from early optimization-based solutions that captured sequence-specific 3D models with primitive representations of deformation, towards recent and more powerful 3D-aware neural representations that can learn the variation of shapes and textures across a category and be trained from 2D image supervision only. There has been very successful recent commercial uptake of this technology and I will show exciting applications to AI-driven video synthesis.

Bio: Lourdes Agapito holds the position of Professor of 3D Vision at the Department of Computer Science, University College London (UCL). Her research in computer vision has consistently focused on the inference of 3D information from single images or videos acquired from a moving camera. She heads the Vision and Imaging Science Group, is a founding member of the AI centre and co-director of the Centre for Doctoral Training in Foundational AI. In 2017 she co-founded Synthesia, the London based synthetic media startup responsible for the AI technology behind the Malaria no More video campaign that saw David Beckham speak 9 different languages to call on world leaders to take action to defeat Malaria.

Why are there colors in the ocean?

Derya Akkaynak- Harbor Branch Oceanographic Institute, Florida Atlantic University

October 27 - 18:00h - Online
Google meet: https://meet.google.com/pph-igmb-ghx
Youtube livestream: https://youtu.be/DIfyPRGoodY


Abstract: The color of ocean water provides us tremendous insights regarding the properties of the particles in it. For example, using satellites that sense ocean color, we are able to monitor worldwide concentration of phytoplankton—tiny organisms in the water column that produce food for everything else in the ocean to eat. That ocean water has color, however, is precisely what is holding us back from unveiling the colors of everything else, i.e., the colors of the ocean flora, fauna, and the unique habitats that host them. Why does anything in the ocean have color, if that color is to be masked by the color of the water? What would we learn if we could survey the true colors of everything in the ocean? We don’t know the answer to that yet, but in this talk I will tell you what’s holding us back from surveying the true colors of everything in the ocean.

Bio: Derya Akkaynak is an engineer and oceanographer (PhD MIT & Woods Hole Oceanographic Institution ‘14) whose research focuses on problems of imaging and vision underwater. She has professional, technical, and scientific diving certifications and has conducted underwater fieldwork in the Bering Sea, Red Sea, Antarctica, Caribbean, Northern and Southern Pacific and Atlantic, and her native Aegean. Akkaynak was a finalist for the 2019 Blavatnik Awards for Young Scientists for resolving a fundamental problem in underwater computer vision—a new equation specific to the reconstruction of lost colors and contrast in underwater photographs. She is starting her research lab in Eilat, Israel.

How do we develop a Machine Learning powered system for our smart appliances at BSH?

Blanca Grilló and Jorge Escribano
AI Team @BSH Home appliances

9th November - 13.00h @Salón de Actos.
Ada Byron building. EINA (not online!)

Abstract: Do you imagine visualizing from your mobile what is inside your fridge and automatically add to your shopping list what is missing? In BSH, we are developing the home appliances of the future using machine learning techniques.

This process involves several complex and different steps, which are fundamental for the success of the project, and go beyond just implementing an ML algorithm.

In this talk, we will explain the lifecycle of a machine-learning project in production, we will deepen into all different stages and roles you need to succeed.

Bio: Blanca Grilló is a mathematician working for BSH home appliances. She has more than 20 years working in software products for different companies, first years as developer and designer, and in the last 14 years as head of software engineering teams. In the last 2 years, she has started leading an AI team at BSH, focused on development of smart appliances using solutions based on machine learning.
Jorge Escribano holds PhD in Computational Simulation and has 6 years of experience working on the research field, developing computational models to reproduce and predict physical and biological phenomena.  After that, he joined BSH Home Appliances and currently works on building AI solutions for smart appliances.

Learning, Leveraging, and Optimizing Human Experience in Metaverse

Qi Sun- New York University 

November 24 - 6pm - Online
Google Meet:  https://meet.google.com/zvp-rnok-qtp
Youtube livestream: https://youtu.be/whnZX2VxEjo

Abstract: The world is becoming unprecedentedly connected thanks to emerging media and cloud-based technologies. The holy grail of metaverse requires recreating a remotely shared world as a digital twin of the physical planet. In this world, the human is probably the most complex mechanical, physical, and biological system. Unlike computers, it is remarkably challenging to model and engineer how humans perceive and react in a virtual environment. By leveraging computational advancements such as machine learning and biometric sensors, this talk will share some recent research on altering and optimizing the human visual and behavioral perception toward creating the ultimate metaverse.

Bio: Qi Sun is an assistant professor at New York University, Tandon School of Engineering (joint with Dept. of Computer Science and Engineering and Center for Urban Science and Progress). Before joining NYU, he was a research scientist at Adobe Research and a research intern at NVIDIA Research. He received his Ph.D. at Stony Brook University. His research interests lie in computer graphics, VR/AR, vision science, machine learning, and human-computer interaction. He is a recipient of the IEEE Virtual Reality Best Dissertation Award. 

PhD Defense  - Monocular SLAM for Deformable Scenarios 

José Lamarca Peiró- University of Zaragoza 

December 3rd - 10am - Online
Google Meet:  TBA

Abstract:  In recent years, Simultaneous Localization and Mapping --SLAM-- has become a mature technology to reconstruct day-to-day scenarios and locate a sensor in them. For the sake of simplicity, most state-of-the-art systems assume rigidity in all the stages of the algorithm, i.e. the tracking and the mapping, since it is a valid assumption for most human-reconstructed scenarios. However, in-body sequences are dominantly deformable and as a consequence the state-of-the-art monocular SLAM methods are inapplicable to the sequences. Doing SLAM in deformable scenarios becomes even more challenging when using monocular cameras as input sensors due to the lack of depth perception. In this thesis, we created a deformable tracking stage able to recover the pose of the camera and the deformation of the map from a single view in real-time. We have also created a deformable mapping able to recover the scene structure from monocular sequences with deformation. Assembling both stages, we conceived the first monocular SLAM systems able to recover the structure of deformable scenes both in laboratory experiments and in-body sequences. This is one of the first steps towards a new generation of monocular SLAM systems in changing scenarios without assuming rigidity. 

Bio: Jose Lamarca is finishing his PhD. in Computer Vision and Robotics at the University of Zaragoza, advised by Prof. José M. M. Montiel. Before his PhD, he spent 9 months working with Kirill Safronov and Sarah Gillet at Kuka Robotics in Augsburg, Germany, where developed his Master Thesis in Active SLAM. His thesis topic is Monocular SLAM in deformable scenarios. During his PhD, he published or exposed his work in top computer vision and robotics conferences and journals like the ECCV, ICRA or Transactions on-Robotics. He stayed in one of the top laboratories in monocular non-rigid application with Prof. Adrien Bartoli at the Université Clermont Auvergne for 4 months. He also carried out a 4 months internship in Facebook Reality Lab department, Redmond, under the supervision of Dr. Jakob Engel and Dr. Jing Dong. After graduating, José will join Apple at their Munich office where he will continue his career in Computer Vision and SLAM.

Perceptual Display and Fabrication

Piotr Didyk - University of Lugano 

December 15 - 1pm - Online
Google Meet:  meet.google.com/fho-ikty-fij
Youtube livestream: https://youtu.be/sVr0bXDsmNA

Abstract: Novel display devices and fabrication techniques enable highly tangible ways for creating, experiencing, and interacting with digital content. The capabilities offered by new output devices, such as virtual and augmented reality head-mounted displays and new multi-material 3D printers, make them real game-changers in many fields. At the same time, the new possibilities offered by these devices impose many challenges for content creation techniques regarding quality and computational efficiency. This talk will demonstrate how studying and modeling human perception enable new computation techniques that address these problems. To this end, I will present our work on new methods for generating and optimizing visual content for novel display devices, which strive for a perfect balance between visual quality and computational efficiency. I will also present our efforts in designing new computational fabrication techniques for fabricating objects that look and feel like the real ones. Finally, I will discuss possibilities of combining new perceptual insights with hardware advancements to improve human task performance beyond the human capabilities in the natural world.

Bio: Piotr Didyk is an assistant professor at the University of Lugano (USI), Switzerland, leading the Perception, Display, and Fabrication Group. His research combines expertise in perception, computation, and hardware to create new display and fabrication techniques. Before joining USI, he led a research group at Saarland University and Max Planck Institute for Informatics in Germany. He has also spent two years as a postdoctoral researcher at the Massachusetts Institute of Technology. He obtained his Ph.D. degree from Saarland University and Max Planck Institute. In 2018, he received an ERC Starting Grant, and in 2019 was elected Junior Fellow of the European Association for Computer Graphics. 

PhD Defense - Leaming Visual Appearance: Perception, Modeling and Editing

Manuel Lagunas - Universidad de Zaragoza 

December 17- 12pm - Online
Youtube livestream: https://youtu.be/KR2BuqX4nNM

Abstract: Visual appearance determines our understanding of an object or image, and as such it is a fundamental aspect in digital content creation. It is a general term, embracing others like material appearance, which can be defined as the visual impression we have about a material, and involves the physical interaction between light and matter, and how our visual system perceives it. However, computationally modeling the behavior of our visual system is a complex task, partially because no definite, unified theory of perception exists. Moreover, although we have developed algorithms that are able to faithfully model the interaction between light and matter, there is a disconnection between the physical parameters that those algorithms use and the perceptual parameters that the human visual system understands. This, in turn, makes manipulating such physical parameters and their interactions a cumbersome and time-consuming task, even for expert users. This thesis aims at furthering our understanding of material appearance perception, and leveraging it to improve existing algorithms for visual content generation. This is done by establishing connections between the physical parameters governing the interaction between light and matter, and high-level, intuitive parameters or attributes understood by humans. Specifically, the thesis makes contributions in three areas: proposing new computational models for measuring appearance similarity; investigating the interaction between illumination and geometry, and their effect on material appearance; and developing applications for intuitive appearance manipulation, in particular, human relighting and material appearance editing.

Bio: Manuel Lagunas is a PhD student in the the Graphics and Imaging Lab of Universidad de Zaragoza, Spain, co-supervised by Diego Gutierrez and Belen Masia. He obtained his Bachelor (2015) and Master (2016) degree at Universidad de Zaragoza, majoring in Computer Science and Applied Maths, respectively. He has been a Research Intern in Adobe Research (San Jose, CA) during the summers of 2019 and 2020. His main research topics lie in the interface between machine learning, computer graphics, and human perception. After graduating, Manuel will join Amazon at Madrid office as an Applied Scientist.

An introduction to light transport analysis

Matthew O'Toole - Carnegie Mellon University 

December 21 - 1pm - Online
Google Meet:  meet.google.com/byv-fiej-uxt
Youtube livestream: https://youtu.be/t2Q8ZI8jk70

Abstract: Active illumination refers to optical techniques that use controllable lights (e.g., projectors) and cameras to analyze the way light propagates through the world. In this talk, I will provide a gentle introduction to how we can use such techniques to capture the 3D shape and appearance of objects, and synthesize photorealistic images of a scene under novel illumination conditions.

Bio: Matthew O'Toole is an Assistant Professor with the Robotics Institute and Computer Science Department in the School of Computer Science at Carnegie Mellon University. His research interests span many topics across computer graphics and computer vision, with focus on computational imaging. Prior to joining CMU, he was a Banting Postdoctoral Fellow with Gordon Wetzstein‘s Computational Imaging group at Stanford University. He received his Ph.D. in Computer Science at the University of Toronto under the supervision of Kyros Kutulakos in 2016, along with an M.Sc. in 2009. He took a break from Toronto in 2011 to be a visiting student with the MIT Media Lab’s Camera Culture group led by Ramesh Raskar. His work was the recipient of a couple of runner-up best paper awards at ICCV 2007 and CVPR 2014, best demo awards at CVPR 2015 and ICCP 2015, and a runner-up dissertation award at SIGGRAPH 2017. He was also a co-organizer of the CVPR workshop on Computational Cameras and Displays in 2016 and 2017, and taught a course on the subject at SIGGRAPH 2014. 

On Continuous Models for Sensor Integration, Localisation, Mapping and Planning

Teresa Vidal - University of Technology Sidney

6th April - ** 9AM ** - Online 

Meet: https://meet.google.com/zmk-qahf-bep

Youtube: https://youtu.be/dFMitPavDu4

Abstract: In the first part of my talk I will talk about our recent work on faithful Euclidean distance field estimation for localisation, mapping and planning using a continuous and probabilistic implicit surfaces (Log-GPIS). In the second part of the talk, I will discuss our work on analytical preintegration of continuous inertial measurements using linear operators on Gaussian Process Kernels, which are the core of some of our localisation and mapping frameworks such as LiDAR/Inertial (IN2LAAMA) and Event-camera/Inertial (IDOL). I will show the performance of all these approaches in different simulated and real scenarios. 

Bio: Prof. Teresa Vidal Calleja is core researcher of UTS: Robotics Institute and Deputy Head of School (Research) at the School of Mechanical and Mechatronics Engineering of the University of Technology Sydney. Teresa received her BSc in Mechanical Engineering from the National Autonomous University of Mexico, her MSc in Electrical Engineering from CINVESTAV-IPN (Mexico) and her PhD in Automatic Control, Computer Vision and Robotics from the Polytechnic University of Catalonia (Spain). She was postdoctoral fellow at LAAS-CNRS (France) and the Australian Centre for Field Robotics at the University of Sydney (Australia). She joined the former UTS:Centre for Autonomous Systems in 2012, where she has progressed her academic career from UTS Chancellor's Research Fellow to Associate Professor. Teresa has on-going collaborations with world-leading robotics research institutions such as the German Aerospace Center - DLR (Germany) and the Autonomous Systems Lab – ETHZ (Switzerland). Her current industry collaborations are with the manufacturing, meat and livestock, mining and medical devices sectors. Her core research interest is in robotic perception, including multisensory data fusion, active perception, self-localisation and 3D mapping, air/ground robotic cooperation, and autonomous navigation and manipulation.


Our Eyes Beneath the Sea- Advanced Optical Imaging for Marine Research

Tali Treibitz - Professor, Haifa University. Founder & CTO of SeaErra-Vision.

3rd May - 13h - Online

Meet: https://meet.google.com/niy-axhc-peh

YouTube stream: https://youtu.be/W4D7UBpLQrs

Abstract: The ocean covers 70% of the earth surface, and influences almost every aspect in our life, such as climate, fuel, security, and food. The ocean is a complex, vast foreign environment that is hard to explore and therefore much about it is still unknown. Interestingly, only 5% of the ocean floor has been seen so far and there are still many open marine science questions. All over the world, including Israel, depleting resources on land are encouraging increased human activity in the ocean, for example: gas drilling, desalination plants, port constructions, aquaculture, fish farming, producing bio-fuel, and more. As human access to most of the ocean is very limited, novel imaging systems and computer vision methods have the potential to reveal new information about the ocean that is currently unknown. Thus, the future calls for substantial related research in computer vision. The uncertainty stems from the fact that the ocean poses numerous challenges such as handling optics through a medium, movement, limited resources, communications, power management, and autonomous decisions, while operating in a large-scale environment.   In the talk I will give an overview of the challenges in this field and will present novel algorithms and systems we have developed.  

Bio: Tali Treibitz is the head of the Marine Imaging Lab, in the Department for Marine Technologies, Charney School of Marine Sciences, University of Haifa. She holds a Ph.D. in Electrical Eng. from Technion University. Her research involves underwater imaging systems and computer vision. She is also the founder and CTO of SeaErra-Vision, developing intelligent vision solutions for the underwater world based on the developments in her Lab.

From space-borne sensors to the public eye: Data as a visual message

Helen-Nicole Kostis - Science Visualizer at NASA Goddard Space Flight Center

6th May - 13h @ IN PERSON: Room A.7, Ed. Ada Byron - EINA 

Abstract: Data are becoming increasingly complex and voluminous, a trend that will continue to grow as sensor technology, information storage, computational capability and scientific research evolve. However, unless we are up to the challenge of turning data into meaningful content that begets knowledge, we have not completed our quest for discovery. NASA’s Scientific Visualization Studio aims to promote a greater understanding of NASA science through visualization. In our efforts to explain to the public how complex phenomena work and to convey NASA’s latest research results and engineering accomplishments, we need to present the relationships that permeate the observed data and translate technical concepts to accessible messages that have simultaneously scientific integrity and a high visual engagement factor. 

Bio: Helen-Nicole (Eleni) Kostis is a science visualizer at NASA Goddard Space Flight Center.  She holds a degree in Mathematics from the University of Ioannina (Greece), a Master’s in Computer Science and a Master’s in Fine Arts in Electronic Visualization from the University of Illinois at Chicago.During the last fifteen years, she has been exploring the field of data-driven visualization as a storytelling medium to communicate NASA’s research findings to the public and to engage new audiences. In 2010 she led and brought to life the NASA Visualization Explorer (2010-2017), the first of its kind storytelling conduit that released bi-weekly data-driven visualized scientific results. Eleni is the recipient of NASA’s Exceptional Achievement Award for Outreach and is a co-author of the Foundations of Data Visualization (Springer) book.

Brain activity in the control loop: Signal decoding and applications in human skills improvement

Pablo Urcola - Software Engineer - BitBrain 

9th May @13h - IN PERSON: Room A07, Ed. Ada Byron - EINA 

Abstract: Brain activity naturally generates electrical signals that can be used as a control signal. The complexity of such a signal makes it really hard to decode the information transmitted. Over the last years, improvements in sensors, devices and AI allow the extraction of more meaningful markers that can be applied in more complex scenarios.

Bio:  Pablo Urcola received the Ph.D. degree in Computer Science from the University of Zaragoza in 2015 about mobile robotics. He held a research position at the University of Zaragoza from 2010 to 2015, and spent 6 months as research assistant at Lincoln University, UK. Since 2016, he is a software engineer at BitBrain, developing cutting edge neuro-technology products.

Perceptually Guided Computer-Generated Holography

Kaan Akşit, Associate Professor at University College London

23rd May - 13h - Online

Meet: https://meet.google.com/nhd-fxjc-rik

YouTube stream: https://youtu.be/WBDYHmEoFHw

Abstract: Inventing immersive displays that can attain realism in visuals is a long-standing quest in the optics, graphics and perception fields. As holographic displays can simultaneously address various depth levels, experts from industry and academia often pitch these holographic displays as the next-generation display technology that could lead to such realism in visuals. However, holographic displays demand high computational complexity in image generation pipelines and suffer from visual quality-related issues. Hence, this talk will describe our research efforts to combine visual perception related findings with Computer-Generated Holography (CGH) to achieve realism in visuals and derive CGH pipelines that can run at interactive rates (above 30 Hz). Specifically, I will explain how holographic displays could effectively generate three-dimensional images with good image quality and how these images could be generated to match the needs of human visual perception in resolution and statistics. Furthermore, I will demonstrate our CGH methods running at interactive rates with the help of learning strategies. As a result, we provide a glimpse into a potential future where CGH helps replace two-dimensional images generated on today's displays with authentic three-dimensional visuals that are perceptually realistic.

Bio: Kaan Akşit is an Associate Professor in the computer science department at University College London, where he leads the Computational Light Laboratory. Kaan researches the intersection of light and computation, including computational approaches in imaging, graphics, fabrication and displays. Kaan’s research works are widely known among the optics and graphics community for his contributions to display technologies dedicated to virtual reality, augmented reality, and three-dimensional displays with glasses and without glasses. He was a research scientist at NVIDIA, USA, between 2014 and 2020. He is the recipient of Emerging Technologies best in show awards in SIGGRAPH 2019 and SIGGRAPH 2018, DCEXPO special prize in SIGGRAPH 2017, the best papers in ISMAR 2018 and IEEE VR 2017, and best paper nominee in IEEE VR 2019 and 2021.

FIRESIDE chat: Machine Learning inference at Google, How does it work?

David Lopez, Engineering Manager for ML, at Google

3rd June - 13h - Online

Meet: https://meet.google.com/xjg-wwqx-vei

Abstract: This last seminar of the year will have a special format, a fireside chat, where we will have the chance to have an informal chat with David about ML inference at Google and the work they develop in his team to make it possible to run ML at scale. He will make a brief introduction and description, and then we will have a longer period than usual for discussion with a guided question and answer session. David leads a team that is responsible for robustness, scalability and sustainability of the systems where Machine Learning products at Google run (including the well known and diverse applications and ML models running on each of them, such as Translate, Maps, YouTube recommendations, Search, etc). Are you curious how all these ML models are run at Google and what do engineers need to do to make it possible? Join the chat and ask David ;-)

Bio: David Lopez is an engineering manager in Machine Learning at Google. He studied electrical engineering at the Universidad of Valladolid and obtained his master in Software and Systems Security at the University of Oxford. Since 2011 he has been working at Google. He also has expertise in security and anti-abuse products, and he currently leads a team for Machine Learning Inference that serves and facilitates the infrastructure for the numerous ML products at Google to run at scale.