Invited Speakers

Dr. Yung-Hsiang Lu is a professor at the Elmore Family School of Electrical and Computer Engineering of Purdue University. In 2020-2022, he was the director of the John Martinson Engineering Entrepreneurial Center at Purdue University. He is a fellow of the IEEE, ACM Distinguished Scientist, ACM Distinguished Speaker, and Distinguished Visitor of the Computer Society. His research topics include efficient computer vision for embedded systems, cloud and mobile computing. He was the lead organizer of the IEEE Low-Power Computer Vision Challenge 2015-2023.

Title: Efficient Computer Vision for Edge Devices

Abstract: Since deep learning became popular a decade ago, computer vision has been adopted by a wide range of applications. Many applications must run on edge devices with limited resources (energy, time, memory capacity, etc). This speech will survey methods designed to improve efficiency of computer vision, including quantization, architecture search, and trade-off between accuracy and speed. A new architecture called modular neural network is introduced. This architecture breaks a deep neural network into multiple shallower networks and can significantly reduce the sizes of machine models and execution time. A modular neural network is a tree-like structure to progressively analyze different features in images and divide images into different groups based on visual similarities. Modular neural networks can be used for image classification, object counting, and re-identification. This speech will also explain how to use contextual information to reduce computation for convolution.

Dr. Cang Ye is a Professor in the Department of Computer Science at Virginia Commonwealth University. He got the B.Eng. and M.Eng. degree from the Department of Precision Machinery and Precision Instrument, the University of Science and Technology of China, Hefei, Anhui, P. R. China and Ph.D. degree in Electrical and Electronic Engineering from the University of Hong Kong. He was a Research Fellow at the Mobile Robotics Lab of the University of Michigan, Ann Arbor, MI. Since 2003, He was a research faculty at the same university. He joined UALR in 2005. Dr. Ye is a Fellow of the American Institute for Medical and Biological Engineering (AIMBE), a senior member of IEEE, and a member of IEEE SMC Technical Committee on Robotics and Intelligent Sensing.

Title: Visual-Inertial Odometry for Small-sized Robots

Abstract: As today’s mobile phones are getting more and more powerful in sensing and computing, they have become a self-contained and cost-effective computer vision system for various robotic applications. This talk focuses on a smartphone-based visual-inertial odometry (VIO), which uses the phone’s camera, LiDAR and IMU as the sensors for SLAM and the phone itself for SLAM computation. The VIO is able to track the varying camera intrinsic parameters in real time to make device pose estimation more accurate. A robotic cane for wayfinding of the visually impaired is used to demonstrate the use of the VIO in real world. Several issues related to navigation in large-scale real-world environments will be discussed.

Nikos Papanikolopoulos (Fellow, IEEE) received the diploma of engineering degree in electrical and computer engineering from the National Technical University of Athens, in 1987, and the MS and PhD degrees in electrical and computer engineering from Carnegie Mellon University, in 1988 and 1992, respectively. His research interests include computer vision, robotics, sensors for transportation and precision agriculture applications, and control systems. He is the director of the Minnesota Robotics Institute and the McKnight Presidential Endowed professor at the University of Minnesota. He has also received numerous awards including the 2016 IEEE RAS George Saridis Leadership Award in Robotics and Automation.

Title: Object Localization and 3D Reconstruction for Real-World Applications

Abstract: The talk will revolve around object localization and 3D reconstruction for real-world problems. We will start by looking at the area of Precision Agriculture (PA) where large amounts of data for object localization are available. In order to apply the recent successes of machine learning and computer vision on a large scale using robotics, efficient and general algorithms must be designed to intelligently split point clouds (associated with corn plants) into small, yet actionable, portions that can then be processed by more complex algorithms. In this work, we capitalize on a similarity between the current state-of-the-art for roughly segmenting corn plants and a commonly used density-based clustering algorithm, Quickshift. Exploiting this similarity, we propose a novel algorithm, Ground-Density Quickshift++, with the goal of producing a general and scalable field segmentation algorithm that segments individual plants and their stems. This algorithm produces quantitatively better results than the current state-of-the-art on both plant separation and stem segmentation while being less sensitive to input parameters and maintaining the same algorithmic time complexity.

The talk will then describe a region-wide deployment of a unique non-intrusive, multi-camera, truck parking space detection and availability information dissemination architecture. The architecture leverages multi-view Structure and Motion methods to reconstruct a three-dimensional representation of the environment for estimating unoccupied spatial extents to deduce truck parking availability. Unlike 2D camera sensor-based methodologies, the advantage is its immediate adaptability to a variety of parking facilities and scenarios without the need for any subsequent re-training. The approach also mitigates errors arising from occlusions, and many other environmental conditions that confound many 2D camera-based approaches. The region-wide deployment, operated by a state transportation agency (Kansas-DOT), has been online since early 2019, and has thus far substantiated the technical and day-to-day operational viability of the approach.

Corso is a Professor of Robotics, Electrical Engineering, and Computer Science at the University of Michigan, and Co-Founder / Chief Scientist at AI startup Voxel51. His research spans computer vision, robotics, and AI, with over 150 peer-reviewed publications.

Title: Towards Embodied AI with Depth from Standard Video Processing

Abstract: Embodied AI is critical to achieving collaborative agents in the physical world. Toward that end, this talk will cover a sequence of works that explore how signals generated from standard video processing---object detection and segmentation---can be combined with motion information of the agent to robustly infer depth. I'll cover the techniques, a relevant dataset and benchmark, and results that span from everyday mobile phone video to autonomous vehicles.

Mark Campbell is the John A. Mellowes ‘60 Professor at Cornell University. His research interests are in theory and experimental validation for autonomous systems, including robotics and spacecraft and UAVs. His expertise is in sensor fusion, perception, machine learning, and control optimization. He has led multiple collaborative research grants including Cornell’s DARPA Urban Challenge self-driving car team, one of six finishers of the race. He has received several best paper awards, teaching awards including nationally from ASEE, and is a Fellow of ASME, IEEE and AIAA.

Title: Uncertainty Quantification in Deep Learning based Visual Localization

Abstract: A key challenge in many deep learning networks is that they create a deterministic output, with little to no sense of errors or uncertainty. In this talk, we will explore the errors and uncertainties for the problem of deep learning based visual localization using features/keypoints in images. We will leverage the Ithaca365 database, which includes over 80 runs of the same 15km route in varying types of lighting and weather conditions. Thus, we can evaluate visual localization performance results as the conditions of the training and test datasets vary. We propose capturing the errors using several types of uncertainty representations including binned data and Gaussian mixtures, and demonstrate their performance within a formal online estimator

Cornelia Fermüller is a research scientist at the Institute for Advanced Computer Studies (UMIACS) at the University of Maryland at College Park. She holds a Ph.D. from the Technical University of Vienna, Austria and an M.S. from the University of Technology, Graz, Austria, both in Applied Mathematics. She co-founded the Autonomy Cognition and Robotics (ARC) Lab and co-leads the Perception and Robotics Group at UMD. She is the PI of an NSF-sponsored Network for Accelerating Research on Neuromorphic Engineering. Her research is in the areas of Computer, Human and Robot Vision. She studies and develops biologically inspired Computer Vision solutions for systems that interact with their environment. In recent years, her work has focused on interpreting human activities, especially in the domain of instrumental performance, and on motion processing for fast active robots using as input bio-inspired event-based sensors.

Title: Direct Approaches to Visual Navigation

Abstract: Visual odometry is a mature technology with many real-world applications in autonomous driving, robotics, and augmented reality. It is the process of estimating camera pose and motion from image sequences. Classically the motion in the video is treated as an extension of static images by matching features in consecutive video frames. In contrast, in biology, we find systems with low computational power that do not compute correspondence but are very efficient in using visual motion. Inspired by biological vision, we develop in our lab algorithms to solve visual navigation tasks from image motion and computing only essential information. I will describe a bio-inspired pipeline for visual odometry and segmentation that uses as input spatiotemporal filter output and events from neuromorphic dynamic vision sensors and show the implementation of these algorithms in drone applications.

Dr. Michael Gleicher is a Professor in the Department of Computer Sciences at the University of Wisconsin, Madison. Prof. Gleicher is founder of the Department's Visual Computing Group and co-directs both the Visual Computing Laboratory and the Collaborative Robotics Laboratory. His research interests span the range of visual computing, including data visualization, robotics, and virtual/extended reality. His recent work includes exploring perceptual issues in visualization, the use of visual simulation for robotics, and geometric approaches to enhance robot perception and interaction. He has been Papers Chair for EuroVis and Area Chair for IEEE Vis. Prior to joining the University, Prof. Gleicher was a researcher at The Autodesk Vision Technology Center and in Apple Computer's Advanced Technology Group. He earned his Ph. D. in Computer Science (1994) from Carnegie Mellon University, and earned a B.S.E. in Electrical Engineering from Duke University (1988). In 2013-2014, he was a visiting researcher at INRIA Rhone-Alpes. Prof. Gleicher is an ACM Distinguished Scientist. In 2023-2024, Prof. Gleicher holds a concurrent appointment as a Design Scholar at Amazon Robotics.

Title: Miniature Time of Flight Sensors for Robot Perception

Abstract: Commodity time-of-flight distance sensors are small, low cost, low power, and use limited bandwidth. This makes them attractive for robotics applications. However, they provide (seemingly) limited data, with low resolution and poor accuracy with their standard interfaces. In this talk, I will show how we can work around the limitations of these sensors to address the needs of robotics use cases. I will discuss how the commonly available single-photon avalanche diode (SPAD) sensors operate, providing an unusual representation of scene information. I will describe approaches to make use of these devices in demanding robotics applications. The central elements of our work are (1) to use the internal data generated by the sensors and (2) to apply careful modeling of the sensor and the geometry to enable more effective use of of the data. I will show how we apply these ideas to precisely localize the sensor, to find small objects on surfaces, and to reconstruct 3D geometry from sparse samples. These demonstrations show the potential for miniature time-of-flight sensors in robotics applications.

Amit Roy-Chowdhury received his PhD from the University of Maryland, College Park (UMCP) in 2002 and joined the University of California, Riverside (UCR) in 2004 where he is a Professor and Bourns Family Faculty Fellow of Electrical and Computer Engineering, Cooperating Faculty in Computer Science and Engineering, and Director of the Center for Robotics and Intelligent Systems. He leads the Video Computing Group at UCR, working on foundational principles of computer vision, image processing, and machine learning, with applications in cyber-physical, autonomous and intelligent systems. He has published over 200 papers in peer-reviewed journals and conferences. He has published two monographs: Camera Networks: The Acquisition and Analysis of Videos Over Wide Areas and Person Re-identification with Limited Supervision. He is on the editorial boards of major journals and program committees of the main conferences in his area. He is a Fellow of the IEEE and IAPR, received the Doctoral Dissertation Advising/Mentoring Award from UCR, and the ECE Distinguished Alumni Award from UMCP.

Title: Scene Understanding for Safe and Autonomous Navigation

Abstract: Autonomous navigation remains one of the most challenging problems in intelligent systems largely because of the close integration of scene understanding and planning that needs to happen. The scene understanding requires analysis of objects and their collections across various scale, from individual people and their actions to wide-area analysis that could span the interactions of these people with many other objects in the scene. An integrated view that is able to span across these ranges of scale is necessary for robust decision making. In this talk, we will consider a variety of scene understanding problems that need to be solved for autonomous navigation to be successful. At the level of individual people, we will show how to estimate the pose of each individual person under challenging real-life conditions like significant occlusions. At the next higher scale when there are interactions among small groups of individuals and objects, we will demonstrate the power of scene graphs to model the semantics of the scene. At a yet higher level, we will show how to track objects across non-overlapping cameras spread over large areas. Robustness to a variety of operational domains will be considered through all of these tasks. In spite of this, it is unlikely that perfect scene understanding will be achieved and any autonomous agent will need to occasionally interact with human experts; we show how this can be achieved with natural language feedback leveraging upon the power of recently developed vision-language models.

Vinod Kulathumani is a Principal Scientist at Amazon AWS where he leads Core AI research and development for several innovative product features on Amazon's smart shopping cart that leverage latest advances in computer vision, machine learning and sensor signal processing. Specifically, he has designed and delivered: (i) a real-time cart localization and product location inference system that lays the foundation for features such as product look-up, way finding in stores, and locality centric Ads (ii) a multi-sensor fusion system that infers shopper actions and maintains a shopping receipt with stringent requirements on accuracy and latency to ensure a pleasant shopping experience.

Prior to joining Amazon, he was Chief Scientist at Tvision where he led the design and implementation of vision-based activity recognition systems on resource constrained platforms. And before entering into industry, he was a tenured Associate Professor at West Virginia University from 2008-2018, where he managed several large research projects on multi-sensor systems, information-centric data dissemination and distributed location services. The outcomes have been demonstrated to agencies such as DARPA and Los Alamos National Labs, and have resulted in 40+ publications in esteemed Journals and Conferences. He has also co-founded a startup company called Aspinity, which focuses on ultra-lower chips for AI applications based on analog signal processing. He completed his Ph.D. in Computer Science at The Ohio State University in 2008.

Title: Mapping and Localization in Large Scale Retail Environments

Abstract: This talk focuses on localizing a shopping cart within large retail stores which lays the foundation for several customer delighting applications such as product location lookup, wayfinding, nearby product recommendations and locality-centric Ads. This setting differs from typical SLAM settings: on the one hand, resource constraints on the shopping cart make it infeasible to apply a continuous mapping solution; but on the other hand, localization precision requirements can be relaxed, and while products on shelves may be continuously changing, the store structure tends to be relatively slow changing.

We develop a solution which (i) combines RGB, depth, and IMU data to build an initial offline map via SLAM; (ii) performs real-time, GPU-optimized re-localization via RGB image matching; (iii) uses collaborative, dynamic sampling across multiple carts to selectively update the SLAM map and (iv) builds and maintains a product location planogram using product purchase locations. Our real-time re-localization and dense mapping system have been tested in several large-scale retail grocery stores and validated by comparing the obtained trajectories with a fiducial-based localization system.