Order based on speaking time
Title: Seeing Together: Collaborative Perception in Multi-Robot Systems
Abstract: In today's rapidly evolving technological landscape, multi-robot systems are becoming increasingly essential in applications ranging from autonomous driving to search and rescue. A critical capability enabling these systems is collaborative perception, the ability of robots to share and fuse their sensory data to build a shared understanding of their environment and of each other. However, achieving effective collaborative perception is challenging due to the dynamic, uncertain nature of real-world environments and the complexity of teammate interactions. In this talk, I will present our research addressing these challenges. I will begin with our work on correspondence identification, which focuses on matching the same objects across observations from different teammates. This is a key step in ensuring consistent object references among teammates. Next, I will discuss multi-robot sensor fusion, which integrates distributed observations to enhance each robot’s situational awareness of both its surroundings and the team.
Bio: Dr. Hao Zhang is an Associate Professor of Computer Science and Robotics at the University of Massachusetts Amherst, where he leads the Human-Centered Robotics Laboratory. His research focuses on lifelong collaborative autonomy, with an emphasis on multirobot collaboration, human-robot teaming, and robot learning and adaptation. He is a recipient of the National Science Foundation (NSF) CAREER Award, the DARPA Young Faculty Award (YFA), and the DARPA Director's Fellowship. Prof. Zhang has authored over 50 papers in top robotics and AI venues, including RSS, ICRA, CoRL, and CVPR, and has received multiple best paper awards and nominations for his work.
Title: Towards Open-world Perception, Modeling, and Editable Generation
Abstract: In this talk, we will present several innovative projects from our research lab that advance spatial reasoning and intelligence across three key areas: open-world scene perception, large-scale scene modeling, and editable scene generation. Firstly, we will delve into open-world perception, showcasing our work on open-vocabulary 3D perception that utilizes underlying 3DGS scene representations and advanced geometry-aware large foundation models for simultaneous modeling and spatial semantic alignment. Secondly, we will discuss our advancements in large-scale scene modeling, highlighting NeRF-based and Gaussian-based frameworks. These frameworks facilitate simultaneous end-to-end decomposition and modeling of very large-scale scenes, such as city-level environments. Lastly, we will present our research on 3D scene editing and inpainting, demonstrating techniques for achieving high-quality, view-controlled generation of edited scenes. This work paves the way for more dynamic and interactive scene manipulation and visualization.
Bio: Dr. Dan Xu is currently an Assistant Professor in the CSE Department at HKUST. He was a Postdoctoral Research Fellow in the Visual Geometry Group (VGG) at the University of Oxford, under the supervision of Prof. Andrea Vedaldi and Prof. Andrew Zisserman. He received his Ph.D. from the University of Trento under the supervision of Prof. Nicu Sebe. He was also a visiting PhD student in CUHK MMLab under the supervision of Prof. Xiaogang Wang. Dr. Xu's research interests focus on multi-modal and multi-task learning, with applications in 2D/3D scene perception, understanding, and generation. He has been recognized with a Best Paper Award at the International Conference on Pattern Recognition (ICPR) and a Best Paper Nominee Award at the ACM Multimedia Conference (ACM MM). Additionally, he has served as an Area Chair for several top-tier conferences, including NeurIPS, ICML, ICLR, CVPR, ICCV, and ECCV.
Title: TBD
Abstract: TBD
Bio: Dr. Aniket Bera is an Associate Professor at the Department of Computer Science at Purdue University. He is also an Adjunct Professor at the University of Maryland at College Park. Prior to this, he was a Research Assistant Professor at the University of North Carolina at Chapel Hill. He received his Ph.D. in 2017 from the University of North Carolina at Chapel Hill. He is also the founder of Project Dost. He is currently serving as the Senior Editor for IEEE Robotics and Automation Letters (RA-L) in the area of "Planning and Simulation" and the Conference Chair for the ACM SIGGRAPH Conference on Motion, Interaction and Games (MIG 2022).
Title: "Event Cameras and Emerging Marker-based Real-World Localization Applications"
Abstract: Event cameras obtain asynchronous intensity changes, and have the advantages of high temporal resolution, power efficiency, data efficiency, and high dynamic range. Studies over the past decade have shown that they can be applied to a wide variety of tasks, especially motion-related ones, since the changes in pixel intensity are inherently coupled with motion. In this presentation, first, we will briefly introduce event cameras and some latest motion-estimation research. Then, we will introduce its application to marker-based localization in the real-world settings, combining event cameras and modulated LEDs.
Bio: Shintaro Shiba is Research Scientist at Vision AI group in Woven by Toyota. He received a Ph.D. in Engineering from Keio University, Japan, jointly with Technische Universität Berlin, Germany. From 2021 to 2023 he was a visiting researcher at the Robotics Interactive Perception group (Prof. Guillermo Gallego), TU Berlin. In 2024, he received the 14th Ikushi Prize by Japan Society of the Promotion of Science. His publications in computer vision consist of broad areas, including motion estimation (e.g., optical flow and ego-motion), 3D vision (e.g., mono/stereo depth), denoising, and imaging applications, based on learning-based and modeling-based (optimization) approaches.
Title: Mobility Independence with Robotics Intelligence
Abstract: As need increases, access decreases. It is a paradox that as human motor impairments become more severe, and increasing needs pair with decreasing motor ability, the very machines created to provide assistance become less and less accessible to operate with independence. My lab addresses this paradox by incorporating robotics autonomy and intelligence into machines that enable mobility and manipulation: leveraging robotics autonomy, to advance human autonomy. In this talk, I will overview two threads of research with in my lab, highlighting within each the evolution of how robot perception is leveraged.
Bio: Dr. Brenna Argall is a professor of Mechanical Engineering, Computer Science, and Physical Medicine & Rehabilitation at Northwestern University. She is founder and director of the assistive & rehabilitation robotics laboratory (argallab) at the Shirley Ryan Ability Lab (formerly Rehabilitation Institute of Chicago), the #1-ranked rehabilitation hospital in the United States. The mission of the argallab is to advance human ability by leveraging robotics autonomy. Argall is a Fellow of the American Institute for Medical and Biological Engineering (2023), a recipient of the NSF CAREER award (2016), and one of the 40 under 40 by Crain’s Chicago Business (2016). Her Ph.D. in Robotics (2009) was received from the Robotics Institute at Carnegie Mellon University, as well as her B.S. in Mathematics (2002). Prior to joining Northwestern and RIC, she was a postdoctoral fellow (2009-2011) at the École Polytechnique Fédérale de Lausanne (EPFL) and prior to graduate school she held a Computational Biology position at the National Institutes of Health (NIH). More recently (2019), she was a visiting Research Fellow at the Wyss Center for Bio and Neuroengineering in Geneva, Switzerland.
Title: Active Localization, Identification, and Mapping
Abstract: In recent decades, we have witnessed a paradigm shift by which autonomous vehicles and robots are being deployed to support sensing objectives, such as coverage, surveillance, tracking, and target recognition. Significant progress has been made in developing planning and control algorithms that maximize information gain, visibility, and coverage objectives, rather than merely supporting traditional vehicle guidance goals, such as avoiding obstacles and following a pre-defined trajectory. Despite the many methods developed to date for autonomous information-driven planning and sensing, there remain many important research frontiers. This talk presents new topics in the area of sensor planning that challenge the current state-of-the-art, namely, active multiview planning for simultaneous localization, target identification, and mapping. The first part of the talk describes new problems associated with planning the motion of a sensor, such as a camera or sonar, that must obtain many looks prior to being able to properly characterize the target of interest. The second part of the talk presents new ideas from event-based sensing and perception-in-the-control loop that allow imaging sensors to help vehicles localize, map, and identify targets in unknown and contested environments.
Bio: Dr. Silvia Ferrari is John Brancaccio Professor of Mechanical and Aerospace Engineering and Associate Dean for Cross-campus Engineering Research at Cornell University and Cornell Tech. Prior to that, she was Professor of Engineering and Computer Science at Duke University, and Founder and Director of the NSF Integrative Graduate Education and Research Traineeship (IGERT) and Fellowship program on Wireless Intelligent Sensor Networks (WISeNet). Currently, she is the Director of the Laboratory for Intelligent Systems and Controls (LISC) at Cornell University and the co-Director of the Věho Institute for Vehicle Intelligence at the Cornell Tech. Her principal research interests include active perception, robust adaptive control, learning and approximate dynamic programming, and control of multiscale dynamical systems. She is the author of the book “Information-driven Path Planning and Control,” MIT Press (2021), and of the TED talk “Do robots dreams of electric sheep?”. She received the B.S. degree from Embry–Riddle Aeronautical University and the M.A. and Ph.D. degrees from Princeton University. She is a senior member of the IEEE, and a member of ASME, SPIE, and AIAA. She is the recipient of the ONR young investigator award (2004), the NSF CAREER award (2005), and the Presidential Early Career Award for Scientists and Engineers (PECASE) award (2006).
Title: Embodied Pedestrians
Abstract: A number of disciplines are revisiting the topic of embodiment, or how one immerses themselves in the often mundane and everyday encounters that meld bodies and environments into experiences. For example, in artificial intelligence, unveiling the mysteries of embodiment could assist in bridging the gap between Good Old Fashioned AI GOAFI and emerging concepts of Autonomous Machine Intelligence. In retailing, embodiment is a key and actionable component of the customer journey and the touchpoints that can be manipulated to drive sales. In social psychology, embodiment is entwined with issues of affect and affordance and explaining their shifting reliance on context. In behavioral geography, embodiment is a key premise of Non Representational Theory, or the idea that symbolic geographies are actually living products of how people’s senses engage the things that they happen upon, which collectively drive up-scale geographies such as places, landmarks, and crowds. Much of this work is theoretical, which is advantageous in opening-up opportunities for convergence across disciplines. A next step in this convergence is to develop supporting empiricism. In this talk, I will examine embodiment through the topic of pedestrian motion through urban environments, which applies across many disparate fields. Pedestrian embodiment, I will argue, can render much of the theoretical arguments for embodiment practical. Specifically, I will show that empirical insight on embodiment is available by studying and simulating pedestrians as they engage with city streetscapes. Getting this information involves some work to establish new parity between (1) computer vision that mimics human visual senses, (2) location-based analyses of pedestrian journeys to document real-world embodied encounters at the scale of the individual and the small bubbles of space and time that form encounters, and (3) reconsidering wearable brain-computer interfaces as possible indices for people’s experiences. I will demonstrate that pedestrian embodiment can feasibly be cataloged in real-time and real-world settings at encounter-scale, and that the data from these catalogs can be passed to simulations as testbed cyberspaces that flexibly mix-up real human user experiences with agent AI representations of pedestrians for further, broader, flexible experimentation.
Bio: Dr. Paul M. Torrens is a Professor in the Department of Computer Science and Engineering at the Tandon School of Engineering in New York University, where he is also faculty in the Center for Urban Science + Progress. Prior to joining NYU, Paul was the founding Director of the Center for Geospatial Information Science at the University of Maryland, College Park. From 2023-2025, Paul was a Program Director for Engineering Research Centers (ERC) at the National Science Foundation. He is a prior recipient of the Presidential Early Career Award for Scientists and Engineers (PECASE) for his work in modeling human behavior. Paul received a Ph.D. from the Centre for Advanced Spatial Analysis at University College London in 2004.
Title: EBS-EKF: Accurate and High Frequency Event-based Star Tracking
Abstract: Star trackers perform high-precision attitude tracking of satellites using optical sensors, producing right ascension, declination, and roll measurements in support of spacecraft control. Event-based cameras, which capture temporal changes in local brightness, are a promising new technology for star tracking due to their low latency and low power consumption. However, their use remains largely unexplored, and all previous methods have only been tested in simulation and are not suited to handle the low-light scenarios present in real-life star tracking systems.
In this presentation we present a novel event star tracking algorithm, called EVT-EKF, which can handle the intricacies faced in real-life star tracking systems. EVT-EKF is composed of several novel contributions. We first derive the expected event signals from the stars based on physical sensor characteristics in low-light scenarios and show that the star’s signal characteristics change significantly based on the brightness of the star. Using this signal model, we develop a new star centroiding algorithm which can more accurately estimate the star’s position in the camera pixel space compared to previous centroiding algorithms. We then design and implement a tracking method which uses a 3D extended Kalman filter to track the camera's attitude with very few events needed.
Bio: Dr. Scott McCloskey is the director of computational imaging at Kitware, a small open-source software R&D organization. He leads a group of researchers using non-traditional imaging techniques to enable improved performance in applications of computer vision. Dr. McCloskey holds a PhD from McGill University, a Master's degree from the Rochester Institute of Technology, and a Bachelor's degree from the University of Wisconsin - Madison, all in Computer Science.
Title: Visual Active Search
Abstract: I will present our recent work on Visual Active Search (VAS), a new problem in visual decision-making, where the goal is to find as many target objects as possible in a broad geospatial area, guided by visual information and reliable feedback collected during the search. This problem entails a complex tension between exploitation—using obtained information to decide where target objects are most likely located—and exploration, which improves future search reliability by acquiring the most informative data. Specifically, I will present our advances that leverage reinforcement learning techniques coupled with novel architectures that enable rapid adaptation to new domains and search targets, while also supporting the use of limited visual and multimodal information to both specify search goals and guide the search.
Bio: Dr. Yevgeniy Vorobeychik is a Professor of Computer Science & Engineering at Washington University in Saint Louis. Previously, he was an Assistant Professor of Computer Science at Vanderbilt University. Between 2008 and 2010 he was a post-doctoral research associate at the University of Pennsylvania Computer and Information Science department. He received Ph.D. (2008) and M.S.E. (2004) degrees in Computer Science and Engineering from the University of Michigan, and a B.S. degree in Computer Engineering from Northwestern University. His work focuses on trustworthy AI, alignment, and computational game theory and social choice. Dr. Vorobeychik received an NSF CAREER award in 2017, and was invited to give an IJCAI-16 early career spotlight talk. He also received several Best Paper awards, including one of 2017 Best Papers in Health Informatics. He was nominated for the 2008 ACM Doctoral Dissertation Award and received honorable mention for the 2008 IFAAMAS Distinguished Dissertation Award.
Title: On Track(ing): Always-on Mobile Augmented Reality
Abstract: Since my involvement in the first outdoor visual-overlay AR system, the Columbia Touring Machine, in the later part of the 1990s, my work has often focused on topics related to wide-area augmented reality. We are closer to the feasibility of an "Always-on Mobile AR" vision than ever before, but while some stakeholders already proclaim that "Tracking is solved", many challenges remain. Is the current state of visual odometry (and scene recognition, for that matter) good enough for robust mobile AR applications? What critical design implications arise from unreliable tracking in AR applications? This talk presents my lab's pursuit of mobile AR technologies over the years. In order to determine the potential impact of tracking inaccuracies on user performance and cognition in controlled user studies, we have occasionally simulated tracking artifacts. Recent results from our user studies on navigational guidance tasks indicate that mobile AR users may still opt to utilize direct-overlay augmentations even if there is significant instability in the presented AR scene augmentations. Recent success stories in 3D scene understanding and new user expectations to have AI expertise at their fingertips (or in front of their eyeballs) may have profound impact on the acceleration of intelligent spatial computing technologies. Visual tracking is sure to be an integral part of this.
Bio: Dr. Tobias Höllerer is Professor of Computer Science at the University of California, Santa Barbara. He directs the “Four Eyes” Laboratory, conducting research in the four I's of Imaging, Interaction, and Innovative Interfaces. His research spans several areas of HCI, real-time computer vision, computer graphics, social and semantic computing, and visualization. He obtained a PhD in computer science from Columbia University in 2004. In 2008, he received the US National Science Foundation’s CAREER award for his work on “Anywhere Augmentation”. This work enabled seamless mobile augmented reality and demonstrated that even passive use of AR can improve the experience for subsequent users. He served as a principal investigator on the UCSB Allosphere project, designing and utilizing display and interaction technologies for a three-story surround-view immersive situation room. He co-authored a textbook on Augmented Reality and has (co-)authored over 300 peer-reviewed journal and conference publications in areas such as augmented and virtual reality, computer vision and machine learning, intelligent user interfaces, information visualization, 3D displays, mobile and wearable computing, and social and user-centered computing. Several of these publications received Best Paper or Honorable Mention awards at esteemed venues including IEEE ISMAR, IEEE VR, ACM VRST, ACM UIST, ACM MobileHCI, IEEE SocialCom, and IEEE CogSIMA. He was named an ACM Distinguished Scientist in 2013. He is a senior member of the IEEE and IEEE Computer Society and member of the IEEE VGTC Virtual Reality Academy.
Title: From Traffic-Informed Autonomous Driving to Collaborative Perception
Abstract: Rapid urbanization and increasing traffic have led to digitalization of modern cities and automation of transportation means. As new technologies like VR systems and self-driving cars emerge, there is an increasing demand to incorporate realistic traffic flows into virtualized cities. In this talk, we first present a novel method to reconstruct city-scale traffic using statistical learning on GPS data and metamodel-based simulation optimization for dynamic data completion in areas of insufficient data coverage. We also propose a novel differentiable hybrid traffic simulator that simulates traffic using a hybrid model of both macroscopic and microscopic models and can be directly integrated into a neural network for traffic control and flow optimization, being the first differentiable hybrid traffic simulator for planning and control of autonomous systems. Next, we present a unifying framework on a learning-based, multi-level control policy for autonomous vehicles by simulated accidental scenarios for training. We further introduce a simple yet effective framework for improving the robustness of learning algorithm against image corruptions for autonomous driving, due to both internal (e.g., sensor noises and hardware abnormalities) and external factors (e.g., lighting, weather, visibility, and other environmental effects). Finally, we introduce novel representations for cooperative perception and shared decision-making for multi-agent systems in future mixed autonomy where multiple humans collaborate side-by-side with AI-empowered agents. We conclude by suggesting possible future directions.
Bio: Dr. Ming C. Lin is currently Distinguished University Professor, Barry Mersky and Capital One E-Nnovate Endowed Professor, was the former Elizabeth Stevinson Iribe Chair of Computer Science at the University of Maryland College Park, and John R. & Louise S. Parker Distinguished Professor Emerita of Computer Science at the University of North Carolina (UNC), Chapel Hill. She is also an Amazon Scholar. She obtained her B.S., M.S., and Ph.D. in Electrical Engineering and Computer Science from the University of California, Berkeley. She received several honors and awards, including the NSF Young Faculty Career Award, UNC Hettleman Award for Scholarly Achievements, Beverly W. Long Distinguished Professorship, IEEE VGTC Virtual Reality Technical Achievement Award, Washington Academy Distinguished Career Award, and many best paper awards at international conferences. She is a Fellow of National Academy of Inventors, ACM, IEEE, and Eurographics, ACM SIGGRAPH Academy, and IEEE VR Academy.
Her research interests include AI/ML, computational robotics, physically-based modeling, virtual reality, sound rendering, and geometric computing. She has (co-)authored more than 400 refereed publications in these areas and co-edited/authored four books. She has served on hundreds of program committees of leading conferences and co-chaired dozens of international conferences and workshops. She is currently an elected member of Computing Research Association (CRA) and CRA-Widening Participation (CRA-WP) Board of Directors; she is/was the Chair of IEEE Computer Society (CS) Fellows Selection Committee, Chair of IEEE CS Harry Goode Memorial Award Committee, Computer Pioneer Award Committee, and Founding Chair of ACM SIGGRAPH Outstanding Doctoral Dissertation Award. She is a former member of IEEE CS Board of Governors, a former Editor-in-Chief of IEEE Transactions on Visualization and Computer Graphics (2011-2014), a former Chair of IEEE CS Transactions Operations Committee, and a member of several editorial boards.
Title: All-Day, All-Night, Always-On Visual-Inertial Navigation and Relocalization: Challenges in Deploying Drone Autonomy at Scale (Joint talk)
Abstract: At Skydio, we develop autonomous drones that rely on tightly fused multi-sensor inputs—including six fisheye navigation cameras, one high-resolution gimbal camera, one thermal camera, IMUs, GPS, and barometer—to enable robust, real-time visual-inertial odometry (VIO) and relocalization. Our VIO system runs continuously—day and night—across both visible and infrared spectra, supporting navigation in challenging environments such as GPS-denied and high-altitude settings. To enable repeatable missions, we deploy a Vision Positioning Service (VPS) that relocalizes the drone against previously built maps to ensure consistent flight paths across time.
This talk covers our sensor fusion architecture, VIO pipeline, relocalization system, and the infrastructure and tools we use to continuously evaluate algorithm performance. It also highlights the challenges of deploying scalable, always-on autonomy across a fleet of drones operating in diverse real-world conditions.
Bios: Xipeng Wang is a Senior Autonomy Engineer at Skydio, where he works on high-altitude visual-inertial odometry (VIO), Visual Positioning System, and GNSS state estimation. He has also lectured at the University of Michigan, teaching Deep Learning and Advanced AI. Xipeng received his Ph.D. in Computer Science and Engineering from the University of Michigan, where his research focused on efficiency and reliability in large-scale localization and mapping problems.
Samuel Wang is a Senior Autonomy Engineer at Skydio, where he led the launch of the NightSense system, enabling visual navigation in no-light conditions, and worked on the Vision Positioning System for map-based relocalization in multi-battery and dock-based missions. His work spans visual-inertial odometry, sensor fusion, factory calibration, and obstacle avoidance. Prior to Skydio, he contributed to the drone and sensor suite systems at DJI and Near Earth Autonomy. Samuel holds an M.S. in Robotics from Carnegie Mellon University.