Congrats to the Challenge Winners:
First place: Team Hillo World (registered as soon gu) - Songen Gu, Lina Liu, Binjie Liu, Lei Yang [Video]
Second Place: Team OneVol - Ahmad AlMughrabi, Umair Haroon, Ricardo Marques, Petia Radeva [Video]
Third Place: Team wuhu648648 - Shipeng Liu, Xiangji Kong, Runfeng Lv, Yatao Xu, Wei She [Video]
Challenge 1: 3D Reconstruction From Monocular Multi-Food Images
Background
3D reconstruction from monocular images is a rapidly evolving research area in computer vision with significant applications in food image analysis. The ability to reconstruct 3D food models from single 2D eating occasion images in real-world physical units allows users to share food experiences in three dimensions and provides crucial information about food portions, facilitating the tracking of individual nutrition intake. However, 3D reconstruction presents unique challenges that make it particularly valuable to evaluate the robustness and capability of existing computer vision algorithms. Specifically, food items in in-the-wild images exhibit a wide variety of colors, textures, shapes, and sizes. Their reflective properties can vary drastically, from the glossy surface such as a boiled egg to the matte exterior such as yeast bread. Furthermore, when multiple food items are placed together, occlusions, overlapping, and shadows introduce complex scenarios that a robust algorithm needs to extract useful information. Additionally, food is rarely uniform where the same dish can be presented in multiple ways, making approaches using predefined 3D models or shapes as references insufficient. Therefore, evaluating CV/AI algorithms on 3D food reconstruction pushes the boundaries of current technologies and encourages them to handle high intra-class variability and inter-class data with complex spatial arrangements.
Description
This year's challenge will emphasize multi-food scenarios without explicit physical references. These changes are motivated to align the challenge more closely with real-world applications. By focusing on multi-food scenarios, we aim to address the complexity of actual eating occasions where multiple items with diverse shapes, textures, and reflective properties coexist on a plate. In addition, the removal of explicit physical references encourages the development of algorithms that can infer scale and orientation from implicit cues (such as plates, utensils, or common food items of known size) found in natural eating settings. This realistic setup not only enhances the challenge's relevance to practical applications in nutrition tracking and dietary assessment but promotes the creation of more intelligent and adaptive computer vision solutions.
To start, we recommend you use image to 3D generation model to create a mesh. Then use metric depth estimation model to calibrate the size. See rules for the recommended external resources.
Evaluation Criteria
The evaluation will consist of a two-phase process focused on assessing the precision of the reconstructed 3D models in terms of both shape (3D structure) and portion size (volume). Results will be compared against reference models of the same food items captured using a 3D scanner.
In Phase-I (Volume Accuracy), the Mean Absolute Percentage Error (MAPE) metric will be employed to assess the accuracy of the model's portion size. Only the volume accuracy of the models will be evaluated during this stage.
In Phase-II (Shape Accuracy), The top-ranking teams (3-5, depends on the total number of teams) from Phase I will proceed to Phase II, where they will submit complete 3D mesh files for each food item. We will use the L1 Chamfer Distance metric to evaluate the shape accuracy of the models. Our evaluation protocol follows the Chamfer Distance metric used in DTU benchmark.
The final winner will be determined by combining scores from both phases for all private food objects. Each team will receive a ranking (1–5) in each phase, and the final score will be a weighted sum
Phase I (Volume Accuracy): 55% weight
Phase II (Shape Accuracy): 45% weight
Important Dates
Challenge Open: April 8th, 2025
Phase-I submission deadline: May 4th, 2025
Phase-II 3D models submission deadline: May 6th, 2025
Phase-II transformation matrices submission deadline: May 9th, 2025
Announcement of Challenge Winners: May 11st, 2025
For detailed challenge instruction and regulations, please refer to the Kaggle page: https://www.kaggle.com/competitions/3d-reconstruction-from-monocular-multi-food-images/overview