canceled because of illness of the speaker. The program shifts up
Dariu M. Gavrila TU Delft, The Netherlands
Self-driving vehicles finally seem to turn the corner; witness the introduction of highway autopilots that allow drivers to take their eyes off the road and the large-scale pilots with driverless robot taxis in various US cities. Yet challenges remain, especially when dealing with pedestrians and cyclists in dense urban traffic. This talk discusses research at our Intelligent Vehicles group on environment perception and motion planning for self-driving vehicles in these scenarios.
1.30 -- 2:10 p.m.
Gilles Puy, Valeo.ai
In this talk, I will present two ways of exploiting image foundation models to pretrain deep neural networks that process Lidar data.
In the first work, we distill image representations extracted from foundation models into a lidar network (without using any manual labels). Then we study the effect of three pillars on the distillation performance: the capacity of the lidar network, the pretrained image model, and the pretraining dataset. We show that scaling the 2D and 3D backbones and pretraining on diverse datasets leads to considerable improvements in the feature quality. The role of these pillars is actually more important than the distillation method itself, which we propose to simplify for easier scaling and which we call ScaLR. After optimizing each pillar, we managed to reduce the gap with full supervision from about 30 mIoU percentage points to around 10 points in semantic segmentation.
In the second work, we directly repurpose a ViT (designed to process images) to process lidar data in a model called RangeViT. This model takes as input a range representation of the lidar point cloud. The key ingredients for a successful adaptation are: (a) starting from a ViT pre-trained on large image datasets; (b) substituting a tailored convolutional stem for the classical linear embedding layer; (c) refining the pixel-wise predictions with a convolutional decoder and low-level but fine-grained features from the convolutional stem.
Project pages: https://github.com/valeoai/rangevit and https://github.com/valeoai/ScaLR
2.10 -- 2:50 p.m.
J. Marius Zöllner Karlsruhe Institute for Technology (KIT) & FZI Research Center for Information Technology
Artificial intelligence is omnipresent, embedded in a growing number of tools and applications that we use every day. Autonomous systems, such as robots and automated vehicles, will reshape everyday life, industry and, in particular, the mobility of the future.
We are already seeing successful applications of machine learning in autonomous vehicles, ranging from learning individual components of the overall system, to multiple components simultaneously, to learning vehicle control commands from multi-sensory inputs.
However, the use of these AI-based approaches in real-world connected and autonomous driving scenarios raises new research questions: How can we improve their basic functionalities? How can we increase their resilience? How can intelligent infrastructure create added value? And how can we evaluate AI-based autonomous driving to ensure safe deployment?
This presentation will outline the achievements, lessons learned, and challenges of our research and deployment of AI-powered automated driving functions. We will highlight results from real-life scenarios with various autonomous test and experimental vehicles in the Test Area Autonomous Driving Baden-Württemberg. Attendees will gain an insight into current progress, practical applications and the challenges on the road to full autonomy.
2.50 -- 3.30 p.m.
Scalable Learning Approaches for 3D LiDARs
Patrik Vacek, Czech Technical University in Prague
To enhance scalability, a self-supervised data-driven method for simulating LiDAR sensors in game simulators for sim2real transfer is proposed, enabling the utilization of inexpensive synthetic data during model training. Additionally, a novel data augmentation framework utilizing pre-existing annotated data is introduced, significantly enhancing model performance, particularly for rare classes. Further, the temporal information inherent in LiDAR data sequences is exploited through a spatial-temporal aggregation module, enhancing semi-supervised learning. Together with multiple ensemble teachers, the new aggregation module provides high-quality pseudo-labels for student training, outperforming fully supervised methods with only a small subset of manual labels. Furthermore, a self-supervised 3D scene flow framework is developed, incorporating novel consistency losses to improve flow estimation between sequential point clouds. This approach demonstrates superior performance and generalization across diverse driving datasets. Lastly, joint optimization of flow with instance clustering is proposed, achieving state-of-the-art results, especially in dynamic scenes with multiple independently moving objects.