MMFM Challenge

Overview

Multimodal Foundation Models (MMFMs) have shown unprecedented performance in many computer vision tasks. However, on some very specific tasks like document understanding, their performance is still underwhelming. In order to evaluate and improve these strong multi-modal models for the task of document image understanding, we harness a large amount of publicly available and privately gathered data (listed in the image above) and propose a challenge. In the following, we list all the important details related to the challenge. Our challenge is running in two separate phases. 

Dates

Phase 1 Data Release: March 20th 2024

Phase 2 Data Release: May 20th 2024

Online Evaluation Open: May 20th 2024

Phase 1&2 Submission Due: 5th June 2024


Awards

We will have $10K winner prizes for the top teams.

Datasets

Phase 1

Phase 2

Submission

Registration: Please register for the competition and select the challenge track on the workshop site: MMFM2024 on CMT3.
The challenge will be running in two phases:

We encourage you to read the MMFM Challenge Readme for the details.

Evaluation & Submission: please refer to the Challenge Submission.

Rules: Please read the full rules here: Challenge Phases and General Rules.

Contact

For any questions, please write an email to the organizers, and we will get back to you as soon as possible: contactmmfm2024@gmail.com
License

This repository is licensed under the MIT License. See LICENSE for more details. To view the licenses of the datasets used in the challenge, please see LICENSES.

Partnership

Our challenge winners prize is awarded in collaboration with:

Tensorleap is on the mission of making AI production-ready by enhancing neural network transparency and equipping researchers with advanced technologies for development and monitoring grounded in cutting-edge explainability techniques.

Tensorleap is on the mission of making AI production-ready by enhancing neural network transparency and equipping researchers with advanced technologies for development and monitoring grounded in cutting-edge explainability techniques.