As video generative models rapidly evolve, their potential to revolutionize creativity, communication, and media production grows immensely. However, without consistency—across knowledge, camera motion, narrative elements, and user expectations—the generated content risks losing coherence, trustworthiness, and real-world applicability. Addressing this challenge is essential not only for advancing research frontiers, but also for enabling scalable, reliable, and human-aligned video generation systems that can impact both academia and industry.
Video generative models have ignited a creative revolution, crafting vivid worlds from mere prompts, yet the quest for seamless, coherent content remains a pivotal challenge. We propose a full-day workshop at AAAI 2026, titled Consistency in Video Generative Models: From Clip to the Wild, to address this critical issue through four key dimensions: (1) Intra-clip world knowledge consistency, ensuring semantic and logical coherence within a single clip; (2) Inter-clip camera consistency, maintaining seamless content transitions across camera movements; (3) Inter-shot element consistency, preserving the identity and attributes of characters, scenes, and styles across narrative fragments; and (4) Human-in-the-loop preference consistency, integrating subjective user expectations into modeling and evaluation frameworks.
This workshop will bring together generative AI researchers and practitioners to foster collaboration, set benchmarks, and develop robust methodologies for consistent video generation. Through keynote speeches, panel discussions, and technical sessions, it will drive innovation in trustworthy, credible video generative systems. In line with AAAI's mission to advance AI, this workshop will inspire scalable solutions and attract a diverse audience to tackle one of the most pressing challenges of generative AI.