Challenge
Physically Informed 3D Food Reconstruction
Challenge Winners:
First Place:
Team VolETA, Ahmad AlMughrabi, Umair Haroon, Ricardo Jorge Rodrigues Sepúlveda Marques, Petia Radeva, Universitat de Barcelona, Spain
Second Place:
Team ININ-VIAUN, Jiadong Tang, Dianyi Yang, Yu Gao, Zhaoxiang Liang, Beijing Institute of Technology, China
Best Mesh Reconstruction:
Team FoodRiddle, Yawei Jueluo, Chengyu Shi, Pengyu Wang, Baidu Inc., XPeng Motors, and Beijing University of Posts and Telecommunications, China
Challenge Report:
MetaFood CVPR 2024 Challenge on Physically Informed 3D Food Reconstruction: Methods and Results
Background
3D Food Reconstruction is an innovative venture into the intersection of computer vision and culinary arts, with the goal of reconstructing three-dimensional models of food items from two-dimensional images. This challenge is designed to push the boundaries of 3D reconstruction technology by applying it to a dynamic and uncontrolled setting, such as capturing food images during eating occasions. Participants will tackle the challenge of creating precise and robust 3D models from single or multiple 2D views, using a visible physical reference (checkerboard) with the known size. The checkerboard serves as a reference object to ensure that the reconstructed 3D models have the same size as the real food items, which is crucial for accurate portion estimation. The objective is to reconstruct 3D models for 20 selected food items, provided at varying levels of difficulty: simple, medium, and hard. By reconstructing 3D models with the correct size, this technology can play a vital role in tracking nutritional intake and helping individuals maintain a healthy diet. This endeavor not only aims to enhance sharing food experiences in three dimensions but also has significant potential applications in the fields of nutrition and health monitoring.
3D Data Description
The challenge features 20 selected food items, each meticulously scanned with a 3D scanner while corresponding videos were captured. These items are accompanied by a checkerboard and pattern mat during video capture, serving as physical references for scaling the output 3D models to ensure size accuracy—one of the key evaluation metrics. The difficulty level of the challenge varies with the number of 2D images available for reconstruction: simple levels are represented by approximately 200 images sampled from video, medium levels by about 30 images, and hard levels by a single monocular top-view image. Detailed information on these food items is provided below:
Figure 1: Sample challenge data for “everything bagel”.
Dataset Structure
MTF_Challenge
--object index (1 - 20)
-- Depth (depth image for each frame)
-- Mask (segmentation mask for each frame)
-- Original (sampled original frames)
--source-files
--checkerboard.pdf
--mat.pdf
--data_meta.xls
--eval
--challenge_volume_calculation.py (evaluation script)
-- environment.yml (dependent libraries)
-- volume_calculation_instructions.docx (evaluation instruction)
-- sample_solution.csv (example submission file)
Evaluation Criteria
The evaluation will consist of a two-phase process centered on the precision of the reconstructed 3D models in terms of their shape (3D structure) and portion size (volume).
In Phase-I, the Mean Absolute Percentage Error (MAPE) metric will be employed to assess the accuracy of the model's portion size. Only the volume accuracy of the models will be evaluated during this stage.
In Phase-II, the top-ranking teams from the phase-I will be asked to submit the complete 3D mesh files for each food item to validate their submission results of Phase-I. Then, the Chamfer distance metric will be used to evaluate the shape accuracy of these models.
The final winner will be determined by combining the scores from both phases for all 20 objects.
Important Dates
Challenge Open: March 25th, 2024
Phase-I submission deadline: May 24th, 2024
Phase-II submission deadline: May 27th, 2024
Phase-II 3D models submission deadline: May 27th, 2024
Phase-II transform matrices submission deadline: June 4th, 2024
Announcement of Challenge Winners: May 31st, 2024
Announcement of Challenge Winners: June 7th, 2024
Phase-I Submission Step
1. Registration and Data Acquisition:
Register on the challenge website: https://www.kaggle.com/competitions/cvpr-metafood-3d-food-reconstruction-challenge
Download the challenge dataset.
2. 3D Reconstruction: Use the dataset to perform 3D reconstruction and generate mesh files in .obj format.
3. Scale and Coordinate System:
Ensure the scale for the 3D reconstruction corresponds to 1 unit = 1 meter for accurate evaluation.
Adjust the export settings of your 3D software to match this unit scaling.
4. Naming Convention:
Name each reconstructed model file following the format: “ObjectIndex.obj”
For example, name the file “16.obj” for the object indexed as “everything bagel”.
5. Volume Estimation:
Run the provided script to estimate the volume (with unit of ml) of each .obj file.
The “solution.csv” will be automatically generated by the script.
Note: you are not allowed to manually modify the solution file to submit, all the submitted results in Phase-I will be validated in the phase-II.
6. Submission:
Submit the “solution.csv” file on the submission section of the challenge website.
Each team can submit the results once per day
7. Additional Notes:
It’s crucial to frequently check the website for updates or clarifications regarding the challenge dataset and submission deadlines.
If you encounter any issues during the submission process, contact the workshop organizers promptly.
Please ensure the instructions align with the challenge's rules and expectations, and feel free to adjust as needed based on the specific requirements of the challenge.
Phase-II Submission Step
The top-ranking teams will receive the notification from challenge organizers to submit the complete mesh files of each food object.
Rules
Participants are required to submit 20 .obj mesh files corresponding to the 20 selected food items. These submissions will undergo evaluation based on the reconstruction's shape and size accuracy. It is essential for participants to review the following rules:
1. Submissions must adhere to the provided instructions.
2. Each registered participant/group can only submit the result once per day.
3. The challenge data is not public yet, is not allowed to use for other purposes.
Submissions will be disqualified if they contain inappropriate content or fail to meet the submission rules.
Winner and Prize
The Top 3 teams will be awarded with cash prize and certificate.
Cash Prize (U.S Dollar)
First place: 600$ + complimentary full registration of CVPR 2024
Second place: 300$
Best Mesh Reconstruction: 100$
Certificate
The winner teams will also be invited for oral presentation in the workshop and write technical reports
*Please note that if the number of participants is too low, or if the submitted results do not exceed a predetermined threshold(using the existing methods), the challenge will NOT be awarded a prize and will NOT be presented at the workshop.
External Sources
Participants are allowed to utilize external 3D object datasets for training purposes, such as OmniObject3D[1], NutritionVerse-3D[2], etc. The use of pre-trained models or any form of external data must be clearly disclosed in the submission.
Baseline 3D reconstruction methods include:
Medium/Simple: Use COLMAP for automatic reconstruction with the checkerboard available and apply Meshlab for cleaning and simplification to create a sparse 3D model, save it as an .obj file for the challenge submission.
Hard: Apply existing methods such as One-2-3-4-5[3] to create multiple views from a single image, use them for 3D reconstruction, calibrate scale using a checkerboard pattern, and adjust the 3D model to real-world dimensions for accurate scene representation.
[1] Wu, Tong, et al. "Omniobject3d: Large-vocabulary 3d object dataset for realistic perception, reconstruction and generation." CVPR 2023.
[2] Tai, Chi-en Amy, et al. "Nutritionverse-3d: A 3d food model dataset for nutritional intake estimation." arXiv preprint arXiv:2304.05619, 2023.
[3] Liu, Minghua, et al. "One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization." Neurips, 2023.