Distinguished Keynotes

Keynote Speaker 1 (Remote)

Dr. Koki Nagano

Principal research scientist at NVIDIA Research, USA.

Biography: Prof. Nagano works at the intersection of Graphics and AI with focus on achieving realistic digital humans using data-driven techniques and deep learning. He worked on a 3D display that allows an interactive conversation with a holographic projection of Holocaust survivors to preserve visual archives of the testimonies for future classrooms. His work on skin microgeometry synthesis has helped create digital characters in blockbuster movies such as “Ready Player One” and “Blade Runner 2049” as well as the open source ones such as “Digital Mike” and “Digital Emily 2.0”. His work on photorealistic human digitization has been shown in places including World Economic Forum, EmTech, TEDxCharlottesville, and SIGGRAPH Real-time Live!. His work has also led to the development of the state of the art Deepfake detection technology in collaboration with top media forensics experts. He obtained his PhD from the University of Southern California advised by Dr. Paul Debevec at USC ICT and his BE from the Tokyo Institute of Technology.

Keynote Title: Beyond the Pixels: Creating, Interacting with, and Authenticating the Next Generation of Digital Humans

Time: 9:10 – 9:55

Abstract

Digital human technologies are expected to play a crucial role across various fields, including digital twins, synthetic data generation (SDG) for robotics, and telepresence, where achieving expressive animation, 3D consistency, and real-time inference is vital.

While recent advances in video diffusion models have enabled dramatic quality improvements in 2D avatars, they often sacrifice 3D consistency and inference speed, limiting their applicability in real-world scenarios. In this talk, I will present a new technique that instantly generates 3D-consistent, expressive, and real-time Gaussian head avatars by distilling knowledge from powerful 2D diffusion priors, directly addressing this "trilemma" of 3D head avatars.

As avatar generation becomes instantaneous and hyper-realistic, the need to reliably distinguish real from AI-generated videos has become an urgent necessity. To address this, I will discuss our recent work on how to detect AI-generated videos based on intrinsic, low-level artifacts introduced by video generation architectures, demonstrating its generalizing capabilities against unseen AI video models. Lastly, I will talk about open problems in creating AI models to simulate human speech and gestures for fully autonomous, full-duplex, human-agent and human-robot interactions.

Keynote Speaker 2

Prof. Tatsuya Harada

Research Center for Advanced Science and Technology, The University of Tokyo
Team Director, RIKEN AIP
Vice Director, Research Center for Medical Bigdata, National Institute of Informatics, Japan

Biography: Prof. Harada is a Professor at the Research Center for Advanced Science and Technology, The University of Tokyo. His research interests include visual recognition, machine learning, and intelligent robots. He received his Ph.D. from The University of Tokyo in 2001. He is also a Team Director at RIKEN AIP and Vice Director of the Research Center for Medical Bigdata at the National Institute of Informatics, Japan.

Keynote Title: Real-Time Controllable and Animatable Head Avatars

Time: 10:20 – 11:05

Abstract

Head avatar reconstruction has important applications in virtual reality, online conferencing, gaming, and the film industry, and has attracted significant attention in the field of computer vision. The fundamental goal of this area is to faithfully reconstruct a person’s head and enable precise control over facial expressions and head poses.

In this talk, Prof. Harada will introduce a framework for reconstructing a 3D head avatar from one or multiple images in a single forward pass. He will also present an autoregressive model that enables real-time generation highly synchronized with speech, including lip movements, realistic head poses, and blinking, as well as a talking-head generation framework that allows independent control of style and emotion.

Page updated

Google Sites

Report abuse