GPS-based
BEV network
Occupancy Network
Data driven prediction and planning
Spherical View Rendering
Image-based Rendering
Depth Image-based Rendering
Camera Pose Estimation and Tracking
Image-based Relocalization
Volume Rendering
Image Classification and Search/Retrieval
Visual Object Detection and Tracking
Visual Scene Segmentation
Sensor Calibration and Fusion
Driving Behavior modeling and prediction
Pedestrian/Cyclist behavior modeling and prediction
HD map generation and localization
Simulation of traffic scenes with road network
Data closed loop with smart data selection and automatic annotation
BEV /Occupancy perception network
Large Scale/Foundation Model
LLM (chatGPT, GPT-4.0);
Visual language model (CLIP, DALL-E);
Multi-modality model (PaLM-E, GPT-4V);
Embodied AI for LLM-based agents (RT-X);
Fine-tuning (adapter tuning, prefix-tuning, instruct-tuning, prompt tuning);
Emergence (in-context learning);
Human preference alignment (RLHF);
Hauccination and interpretivity;
Knowlege graph and Reasoning on graph (RoG);
Search engine and Retrieval augmented generation (RAG).