Program

Slides for the talks are available in the respective hyperlinks below.

Date: October 16th, 2017

Duration: 9 AM to 5.30 PM

Morning Session - Chair: Senthil Yogamani (Valeo Ireland)

Opening Remarks (9:00 - 9:10)

Invited Talk #1 (9:10 - 9.40) - Virtual Worlds for the Perception and Control of Self-Driving Vehicles (PDF) - Prof. Antonio M. López (Univ. Autònoma de Barcelona (UAB))

Bio: Antonio Lopez is associate professor at the Computer Science department of the Universitat Autonoma of Barcelona (UAB). He is also a founding member of the Computer Vision Center (CVC) of the UAB, where he created and led the Advanced Driver Assistance Systems (ADAS) group since 2002. Antonio is founding member and co-organizer of consolidated workshops such as the Computer Vision in Vehicle Technology (CVVT) and the Transferring and Adapting Source Knowledge in Computer Vision (CVVT). Antonio has also collaborated in numerous projects with international companies, specially from the automotive sector. His work in the last 10 years has focused on the use of Computer Graphics to train and test On-board Computer Vision systems.

Deep learning has emerged as a key enabling technology for developing autonomous driving under two main paradigms. On the one hand, we can find modular approaches with explicit tasks for detecting the free road, the dynamic objects, etc. and then plan for a safe vehicle maneuver according to particular control laws. These tasks rely on deep models. On the other hand, there are end-to-end driving approaches able to output vehicle motion commands by processing the raw data with a deep model, without explicit tasks for detecting the free road, the dynamic objects or running a specific control law. In the former case, perception and control are separated, in the latter they are not; but in both cases some sort of ground truth (GT) is required for training and testing the self-driving AI agents. In fact, deep models are very data hungry (raw and GT). Our research group at the CVC/UAB has been investigating during the last eight years how virtual worlds and simulation can help training and testing advanced driver assistance systems (ADAS) first, and self-driving AI agents nowadays. In this talk we review all this research presenting last news about our well-known SYNTHIA environment, as well as new simulation environments such as CARLA. Focusing on how these environments can contribute to both self-driving paradigms, as well as to other tasks related to vision zero traffic accidents.

Invited Talk #2 (9:40 - 10.10) - From Deep Learning to Autonomous Driving (PDF) - Prof. J. Marius Zöllner (FZI Research Center for Information Technology)

Bio: Prof. Dr. J. Marius Zöllner studied computer science with special focus on artificial intelligence and robotics at the University of Karlsruhe where he also received his Dr.-Ing. degree (Ph.D.) in 2005. From 1999 he worked with FZI Research Center for Information Technology where he became division manager in 2006. Since 2008 he is professor for Applied Technical Cognitive Systems at the KIT, Karlsruhe Institute of Technology and director at the FZI. Since 2012 he is also member of the executive board of the FZI. Current research activities are focusing on cognitive cars and service robotics. His main areas of research are in the perception and interpretation of the driving environment, probabilistic situation understanding, behaviour decision and machine learning.

Deep Learning and Autonomous Driving are emerging research topics that become more and more interweaved. Besides continuously upcoming new achievements in learning, we see successful approaches in the domain of autonomous vehicles reaching from learning individual components of the overall system, over several components at once, up to directly learning vehicle control commands from visual sensor input. However, when bringing these approaches to real world autonomous driving, the question on how to safely incorporate those techniques into production-grade vehicles arises. This issue can be considered manageable when learning techniques are used for highly dedicated perception tasks with a single learning step, but becomes more complex with increasing responsibilities of the learning system. If vehicles are controlled by learning-based approaches directly, rare failures will have immediate impact and thus more severe consequences. This emphasizes the importance for research towards the additional integration of expert knowledge in order to constrain vehicle behaviors in terms of safety and reliability. his presentation will outline the power and computational expressiveness of deep learning approaches in autonomous driving. Furthermore, the potential of current end-to-end learning concepts for vehicle control using supervised and unsupervised methods will be discussed. This will be followed by potential methods to combine such learning algorithms with model driven and probabilistic approaches in order to gain comprehensiveness and accountability. Experiments and results from real world scenarios with our autonomous research car CoCar will be shown.

Invited Talk #3 (10:10 - 10.40) - Faster Convolutional Architecture Search for Semantic Segmentation (PDF) (Paper) - Felix Friedmann (Autonomous Intelligent Driving GmbH, Germany)

Bio: (Representing main paper author Rupesh Durgesh) Felix Friedmann is currently Tech Lead for perception and prediction at Audi's self-driving car subsidiary Autonomous Intelligent Driving. Felix works on AI-based SW-architectures for self-driving cars, on getting AI product-grade and on proper tools and engineering processes for AI development. Felix holds a diploma from TUM in Electrical Engineering and Information Technology. Felix also co-founded the autonomous driving meetup in munich.

Designing deep learning architectures is a complex task and requires expert knowledge. Convolutional Neural Networks involve different network topologies, layers, layer parameters. In order to automate the design process our approach is based on MetaQNN, the Q-learning agent design architectures by selecting CNN layers. We extend the approach for semantic segmentation task, where architecture involves encoder-decoder layers. To speed up the search process, we use a Hyperband-like technique. Our experiments are evaluated on CamVid urban street scene semantic segmentation dataset. The architectures designed by the Q-learning agent for semantic segmentation task are better than some commonly used hand-designed architectures with similar number of parameters.

Coffee Break with optional poster session(10:40 - 11:10)

Oral Paper #1 (11.10 - 11.30) - Learning Temporal Features with CNNs for Monocular Visual Ego Motion Estimation (PDF) - Michael Weber, Christoph Rist, J. Marius Zöllner (FZI Research Center for Information Technology, Germany)

Making Convolutional Neural Networks (CNNs) successful in learning problems like image based ego motion estimation, stands and falls with the ability of the network to extract the temporal information from videos. Therefore, the architecture of a network needs the capability to learn temporal features for extracting these information. We propose two CNN architectures which are able to learn features to extract this temporal information and are able to solve problems like ego motion estimation. Our architectures achieve first promising results in ego motion estimation and might be a good foundation for systems dealing with temporal information. As the architectures reach real time inference time, they can be applied in domains like autonomous driving.

Oral Paper #2 (11:30 - 11:50) - Speed and Steering Angle Prediction for Intelligent Vehicles Based on Deep Belief Network (PDF) - Chunqing Zhao, Jianwei Gong, Chao Lu, Guangming Xiong, Weijie Mei (Beijing Institute of Technolgy, China)

Learning and predicting human driving behavior plays an important role in the development of advanced driving assistance systems (ADAS). Speed and steering angle which reflect the longitudinal and lateral behavior of drivers are two important parameters for behavior prediction. However, traditional behavior learning methods, especially the methods based on artificial neural networks rely on the human-selected features, and thus have poor adaptability to the highly changeable traffic environment. This paper aims to overcome this drawback by using deep learning which can learn features automatically from the driving data without human interventions. Specifically, the deep belief network (DBN) is used to build the learning model, and the training data are collected from drivers driving on the real-world road. Based on the model, the steering angle of the front wheel and the speed of vehicle are predicted. The prediction results show that, compared with the traditional learning method, DBN has a higher accuracy and can adapt to different driving scenarios with much less modifications.

Oral Paper #3 (11:50 - 12:10) Adding Navigation to the Equation: Turning Decisions for End-to-End Vehicle Control (PDF) - Christian Hubschneider, André Bauer, Michael Weber, J. Marius Zöllner (FZI Research Center for Information Technology, Germany)

Navigation and obstacle avoidance are two problems that are not easily incorporated into direct control of autonomous vehicles solely based on visual input. However, they are required if lane following given proper lane markings is not enough to incorporate trained systems into larger architectures. This paper presents a method to allow for obstacle avoidance while driving using a single, front-facing camera as well as navigation capabilities such as taking turns at junctions and lane changes by feeding turn indicator signals into a Convolutional Neural Network. Both situations share the difficulty intrinsic to single camera setups of limited field of views. This problem is handled by using a spatial history of input images to extend the field of view regarding static obstacles. The trained model named DriveNet is evaluated in real world driving scenarios, using the same model for lateral vehicle control to both dynamically drive around obstacles as well as perform lane changing and turning in intersections.

Lunch Break (12:10 - 13.10)

Small lunch will be provided by the conference.

Poster Session (13:10 - 14:10)

Poster #1 Intent Prediction of Vulnerable Road Users from Motion Trajectories Using Stacked LSTM Network - Khaled Saleh, Mohammed Hossny, Saeid Nahavandi (Deakin University, Australia)

Intent prediction of vulnerable road users (VRUs) has got some attention recently from the research community, due to its critical role in the advancement of both advanced driving assistance systems (ADAS) and highly automated vehicles development. Most of the proposed techniques for addressing the intent prediction problem have been focusing mainly on two methodologies, namely dynamical motion modeling and motion planning. Despite how powerful these techniques are, but they both rely on hand crafting a set of specific features which are scene specific, which in return affects their generalization to unseen scenes which involves VRUs. In this paper a novel end-to-end data-driven approach is proposed for long-term intent prediction of VRUs such as pedestrians in urban traffic environment based only their motion trajectories. The intent prediction problem was formulated as a time-series prediction problem, whereas by just observing a short-window sequence of motion trajectory of pedestrians, a forecasting about their future lateral positions can be made up to 4 secs ahead. In the proposed approach, we utilized the widely adopted architecture of recurrent neural networks, Long-Short Term Memory networks (LSTM) architecture to form a deep stacked LSTM network. The proposed stacked LSTM model was evaluated on one of the popular datasets for intent and path prediction of pedestrians in four unique traffic scenarios that involve pedestrians in an urban environment. Our proposed approach demonstrated competent results in comparison to the baseline approaches in terms of long-term prediction with small lateral position error of 0.39 meters, 0.48 meters, 0.46 meters and 0.51 meters respectively in the four scenarios of the testing dataset.

Poster #2 Feature Detectors for Traffic Light Recognition - Andreas Fregin (Daimler AG, Germany)

Traffic-light recognition belongs to the most difficult topics in the context of autonomous driving. Most described systems in literature follow a classical object recognition approach consisting of detection, verification and tracking. Proceeding from these three tasks, the detection part is of crucial importance, as overlooked traffic lights can most likely not be recovered in subsequent steps. Many published systems rely on feature detectors which try to detect a traffic lights lamp. The most frequently used feature detectors include the spotlight detector, color based detectors, and the circle detector. In contrast to other recognition tasks, no standard evaluation dataset exists. Additionally, only advanced systems consisting of detectors, verifiers and often tracking algorithms have been described in literature. In this paper, we introduce a dataset on which a fair comparison of traffic-light feature detectors is made. We evaluate the effectiveness of the mentioned detectors and call attention to strengths and weaknesses of each detector.

Poster #3 An LSTM Network for Highway Trajectory Prediction - Florent Altche, Arnaud de La Fortelle (MINES ParisTech, France)

In order to drive safely and efficiently on public roads, autonomous vehicles will have to understand the intentions of surrounding vehicles, and adapt their own behavior accordingly. If experienced human drivers are generally good at inferring other vehicles' motions up to a few seconds in the future, most current Advanced Driving Assistance Systems (ADAS) are unable to do so, and only act in reaction to the instantaneous state of the immediately surrounding vehicles. In this article, we present a first step towards consistent trajectory prediction by introducing a long short-term memory (LSTM) neural network, which is capable of accurately predicting future longitudinal and lateral trajectories for vehicles on highway. Unlike previous work focusing on a low number of trajectories collected from a few drivers, our network was trained and validated on the NGSIM US-101 dataset, which contains more than 800 hours of recorded trajectories in various traffic densities, representing more than 6000 individual drivers.

Poster #4 Deep Fully Convolutional Networks with Random Data Augmentation for Enhanced Generalization in Road Detection - Jesús Muñoz-Bulnes, Carlos Fernandez, Ignacio Parra, David Fernández-Llorca, Miguel Angel Sotelo (University of Alcalá, Spain)

In this paper, a Deep Learning system for accurate road detection is proposed using the ResNet-101 network with a fully convolutional architecture and multiple upscaling steps for image interpolation. It is demonstrated that significant generalization gains in the learning process are attained by randomly generating augmented training data using several geometric transformations and pixelwise changes, such as affine and perspective transformations, mirroring, image cropping, distortions, blur, noise, and color changes. In addition, this paper shows that the use of a 4-step upscaling strategy provides optimal learning results as compared to other similar techniques that perform data upscaling based on shallow layers with scarce representation of the scene data. The complete system is trained and tested on data from the KITTI benchmark and besides it is also tested on images recorded on the Campus of the University of Alcala (Spain). The improvement attained after performing data augmentation and conducting a number of training variants is really encouraging, showing the path to follow for enhanced learning generalization of road detection systems with a view to real deployment in self-driving cars.

Poster #5 Deep Convolution Long-Short Term Memory Network for LIDAR Semantic Segmentation - Ahmad Al Sallab, Khaled Elmadawy, Mostafa Gamal, Moemen Abdelrazek, Hesham Eraqi (Cairo University, Egypt)

We propose a deep learning approach based on convolution long-short term memory networks to perform occupancy grid cell based semantic segmentation from LIDAR measurements. The input includes scan points from multiple LIDAR sensors surrounding the vehicle; each composed of multi layered 360 scanning beams, to provide 3D scan images. The output is an occupancy grid map, with the predicted class label for each cell. The experimental setup is based on Gazebo simulator to generate the ground truth, operating under the Robot Operating System. The simulation scenarios are exhaustive to test different real world scenarios including multiple objects. We further evaluate the proposed model on Velodyne laser scanner data mounted in real vehicle, with the ground truth obtained using manual annotation. Several evaluation criteria are used to evaluate the network predictions versus the simulated ground truth. Several deep learning models are evaluated to pick the best architecture. The average precision, recall and F1-score measures prove the efficiency of the proposed network.

Poster #6 Fast Semi-Dense 3D Semantic Mapping with Monocular Visual SLAM - Xuanpeng LI (Southeast University, P.R. China)

Fast 3D reconstruction with semantic information on road scenarios involves issues of geometry and appearance in the field of computer vision. An important idea is that fusion of geometry and appearance would boost the performance of each other. Stereo cameras and RGB-D sensors are widely used to realise fast 3D reconstruction and trajectory tracking in a dense way, but they need heavy computation cost and storage. Moreover, they lack flexibility of seamless switch between different scaled environments, i.e., indoor and outdoor scenes. In addition, semantic information is still hard to acquire in the 3D mapping. We address this challenge by fusion of direct Simultaneous Localisation and Mapping (SLAM) from a monocular camera in a semi-dense way and the state-of-the-art deep neural network approaches. In our approach, 2D semantic information is transferred to 3D mapping via correspondence between consecutive Keyframes with spatial consistency. Since there is a lot of redundancy in the consecutive frames, there is no need to obtain semantic segmentation for each frame in the entire sequence. Consequently, the segmentation could run in a reasonable speed (about 20Hz). We evaluated our methods on road scene datasets and show an improvement in the 2D semantic labelling over baseline single frame predictions.

Afternoon Session - Chair: Felix Friedmann (Audi, Germany)

Invited Talk #4 (14:10 - 14:40) - Deep Learning Implementation and Optimization on DRIVE PX Autonomous Driving Platform (PDF) - Toru Baji (Nvidia, Japan)

Bio: He received his M.S. degrees from Osaka University in 1977, and joined Hitachi Central Research Laboratory. There he made research and development of Solid-state Image Sensors and Processor Architectures. He also conducted the research of Analog-Digital CMOS circuits and DSP architecture in Univ. of California Berkley and Hitachi America R&D. He moved to Semiconductor Div., Hitachi Ltd. in 1993 and then to Renesas. There he served as a department manager of the SH-DSP department and then automotive application department. In 2008 he joined NVIDIA as a Senior Solution Architect for automotive and Tegra SoC business. From 2016 he is a technical adviser and GPU evangelist in NVIDIA.

DL (Deep Learning) offers a superhuman image recognition performance today, and this is the strong motivation to apply this technology for autonomous driving perception. Additionally the capability of driving smoothly in various conditions is expected to be implemented by the End-to-End DL. NVIDIA has developed an autonomous driving platform Drive PX that include both of those DL features. This platform also implement massive parallel computing that is necessary to execute human written algorithms for image processing, localization, path planning etc. In this talk, we will introduce our DRIVE PX platforms based on Parker and Xavier SoCs (System On a Chip), DL and GPU computing implemented on those platforms and key technologies including TensorRT, DLA(DL Accelerators) that enables the DN execution in a practical performance/watt level.

Invited Talk #5 (14:40 - 15:10) - Towards Deep Understanding of the Vulnerable Road User (PDF) - Fabian Flohr (Daimler, Germany)

Bio: Dr. Fabian Flohr is currently a principal engineer and project manager at Daimler. He has completed his Phd focused on pedestrian detection from university of Amsterdam.

Daimler introduced an advanced set of driver assistance functions starting from 2013 in its Mercedes-Benz S-, E-, and C-Class models using stereo vision. Already included was a pedestrian safety component which facilitates fully automatic emergency braking. During the last years our research focused on developing the next-generation of active pedestrian and cyclist safety systems. These systems extract higher-level visual cues and use more sophisticated motion models for path prediction – enabling a deeper traffic situation understanding. The potential to react earlier in dangerous traffic situations, without increasing the false alarms, make such systems essential for autonomous driving in our cities. In this talk we provide an overview over current research projects and show how these works can greatly improve the perception of difficult traffic situations by making use of modern machine learning methods.

Coffee Break with optional poster session (15:10 - 15:40)

Invited Talk #6 (15:40 - 16:10) - Deep learning journey: from driver assistance to driver replacement (PDF) - Prof. Mohamed Moustafa (American University in Cairo)

Bio: Mohamed Moustafa received his PhD in electrical engineering from the City University of New York in 2001. He has been active in the industry. From 1998 to 2003 he was senior principal research scientist at L-1 Identity Solutions corporate research center, NJ, USA (now part of Saphran-Morpho, France), where he conducted research in machine intelligence in face recognition. Currently, he is a principal research scientist in VKANSEE, a startup in NY, USA. In parallel with his industrial activities, Mohamed is currently an Associate professor with the computer science and engineering department at the American university in Cairo. He is a member of the IEEE, the IEEE Computational Intelligence Society, and the IEEE Technical Committee on Pattern Analysis and Machine Intelligence. Mohamed holds four US patents in the field of biometrics and computer vision. He has co-authored more than 40 research papers published in international journals and conferences. His research interests include deep learning, computational intelligence, biometrics, computer vision, neural networks, genetic algorithms and embedded software development.

According to recent expectations, fully autonomous commercial vehicles should start rolling in the streets in less than three years. This is the fruit of at least a decade of research and development work. In this talk, we will trace this interesting journey focusing on its machine vision aspect. It all started with detecting individual objects from images and videos - especially pedestrians. Soon after, with the successful introduction of convolutional neural networks, pixel segmentation of multiple objects simultaneously was deemed possible to assist drivers in unfavorable weather conditions. Furthermore, another line of research focused on monitoring the driver to detect alertness and concentration. Recently, deep learning has been active in estimating the correct steering angle from camera feeds. Hopefully, integrating all these components together can help enjoying the ride safely in autonomous cars be very soon.

Oral Paper presentation #4 (16:10 - 16:30) - Probabilistic Vehicle Trajectory Prediction over Occupancy Grid Map via Recurrent Neural Network (PDF) - ByeoungDo Kim, Chang Mook Kang, Jaekyum Kim, Seung-Hi Lee, Chung Choo Chung, Jun Won Choi (Hanyang University, Korea)

In this paper, we propose an efficient vehicle trajectory prediction framework based on recurrent neural network. Basically, the characteristic of the vehicle's trajectory is different from that of regular moving objects since it is affected by various latent factors including road structure, traffic rules, and driver's intention. Previous state of the art approaches use sophisticated vehicle behavior model describing these factors and derive the complex trajectory prediction algorithm, which requires a system designer to conduct intensive model optimization for practical use. Our approach is data-driven and simple to use in that it learns complex behavior of the vehicles from the massive amount of trajectory data through deep neural network model. The proposed trajectory prediction method employs the recurrent neural network called long short-term memory (LSTM) to analyze the temporal behavior and predict the future coordinate of the surrounding vehicles. The proposed scheme feeds the sequence of vehicles' coordinates obtained from sensor measurements to the LSTM and produces the probabilistic information on the future location of the vehicles over occupancy grid map. The experiments conducted using the data collected from highway driving show that the proposed method can produce reasonably good estimate of future trajectory.

Oral Paper presentation #5 (16:30 - 16:50) - A Survey on Leveraging Deep Neural Networks for Object Tracking (PDF) - Sebastian Krebs, Bharanidhar Duraisamy, Fabian Flohr (Daimler AG, Germany)

Object tracking is the problem of estimating the state of a single or multiple objects based on noisy measurements received from one or several sensors, over time. The field of object tracking spans over several application domains ranging from military radar systems, sensor fusion approaches, to today's computer vision tracking methods employed in consumer electronics, surveillance systems, and it also plays an substantial role in autonomous driving. In recent years the use of deep neural networks have spiked in various fields, due to their staggering performance in detection and classification tasks.This aspect renders these methods also applicable for object tracking. Therefore, the aim of this survey is to give the reader a brief yet comprehensive start into the widespread field of object tracking with focus on the latest deep-based extensions and approaches. At first, traditional non-deep tracking systems are shortly reviewed and a generic model of the individual components of such systems is established. Based on this structure the representative deep-based tracking applications in the literature are classified and presented.

Oral Paper presentation #6 (16:50 - 17:10) - Imitation Learning for Vision-based Lane Keeping Assistance (PDF) - Christopher Innocenti, Henrik Lindén, Ghazaleh Panahandeh, Lennart Svensson, Nasser Mohammadiha (Chalmers University of Technology, Sweden)

This paper aims to investigate direct imitation learning from human drivers for the task of lane keeping assistance in highway and country roads using grayscale images from a single front view camera. The employed method utilizes convolutional neural networks (CNN) to act as a policy that is driving a vehicle. The policy is successfully learned via imitation learning using real-world data collected from human drivers and is further evaluated in closed-loop simulated environments, demonstrating good driving behaviour and a robustness for domain changes. Evaluation is based on two proposed performance metrics measuring how well the vehicle is positioned in a lane and the smoothness of the driven trajectory.

Closing Remarks & discussion (17:10 - 17:30)