Efficient and Reliable Execution Framework for Agentic Workflows
Adaptive LLM Serving Framework with Dual-state Linear Attention
Communication-efficient Training Framework with Model Ensembling
Scalable NDP Architecture for Multi-Dimensional Parallel Training
Adaptive LLM Serving Framework with Dual-state Linear Attention
CadLLM (Under Review)
Adaptive Decoding Controller for Diffusion LLMs
Decoupled MoE Router for Efficient Inference and Expert-aware Batching
Input-adaptive Feed-forward Skipping Strategy