Roboraction.AI

Robotics

Stereo Vision
Multi View Stereo (MVS)
Visual Odometry
Visual SLAM
Inertial-based
GPS-based
BEV network
Occupancy Network
Data driven prediction and planning

Human Machine Interaction

Virtual Reality and Augmented Reality

Spherical View Rendering
Image-based Rendering
Depth Image-based Rendering
Camera Pose Estimation and Tracking
Image-based Relocalization
Volume Rendering

Autonomy

Image Classification and Search/Retrieval
Visual Object Detection and Tracking
Visual Scene Segmentation
Sensor Calibration and Fusion
Driving Behavior modeling and prediction
Pedestrian/Cyclist behavior modeling and prediction
HD map generation and localization
Simulation of traffic scenes with road network
Data closed loop with smart data selection and automatic annotation
BEV /Occupancy perception network

Large Scale/Foundation Model

LLM (chatGPT, GPT-4.0);
Visual language model (CLIP, DALL-E);
Multi-modality model (PaLM-E, GPT-4V);
Embodied AI for LLM-based agents (RT-X);
Fine-tuning (adapter tuning, prefix-tuning, instruct-tuning, prompt tuning);
Emergence (in-context learning);
Human preference alignment (RLHF);
Hauccination and interpretivity;
Knowlege graph and Reasoning on graph (RoG);
Search engine and Retrieval augmented generation (RAG).

Google Sites

Report abuse