3rd MetaFood Workshop
CVPR 2026
CVPR 2026
Introduction
Food domain presents uniquely complex visual and physical characteristics --- food appearance deforms, mixes, and changes through manipulation and consumption. Thus, it provides a perfect test bed for powerful Computer Vision and Deep Learning algorithms. The MetaFood Workshop (MTF) 2026 invites the CVPR community to explore food data analysis and interaction as a new frontier for embodied perception, video generation, and physics-aware modeling.
Understanding how food interacts with tools and humans enables fine-grained video reasoning, not only for estimating how much is eaten, but also for revealing intricate multi-material dynamics during cooking and eating. By bridging embodied AI, dynamic 3D reconstruction, vision-language reasoning, and generative modeling, MetaFood 2026 aims to advance physically grounded, fine-grained understanding and synthesis of food in motion.
Keynote Speakers
Call for Papers
MetaFood’26 will encompass a broad range of topics, including but not limited to:
Embodied and causal understanding of food manipulation and consumption
Physics-informed understanding and 3D reconstruction of deformable, fragile, and multi-material food
Temporal modeling of food transformations and continuous state estimation (e.g., cooking or eating )
Vision–language reasoning, in-context learning, and retrieval-augmented generation for food
Multimodal learning across images, videos, audio, and structured/unstructured text
Self-supervised, continual, semi-supervised, and weakly supervised learning for in-the-wild food data
Uncertainty modeling and learning from noisy or ambiguous labels
Food portion, volume, and nutrition estimation
Food image and video generation using generative AI
2D/3D classification, detection, and segmentation of food items and ingredients