Track 2B: Text-to-Video Generation
Introduction
The realm of AI-powered creativity is expanding rapidly, and one of the latest frontiers is text-to-video generation. With models like Sora, VideoPoet and Pika leading the way, this field is opening up new possibilities for storytelling and content creation.
In this competition, participants will harness the power of these advanced models to generate videos from text prompts. Imagine turning a simple description like "a bustling city at sunrise" into a dynamic, visually stunning video.
Competitors will be provided with a set of text prompts and tasked with developing models that can transform these prompts into engaging, high-quality videos. This challenge not only tests the capabilities of current text-to-video generation technology but also pushes the boundaries of what’s possible in AI-driven content creation.
Join us in this innovative competition, where you’ll have the chance to showcase your skills, contribute to the growing field of AI, and compete for a $2000 USD prize. Let your creativity and technical prowess shine as you bring text descriptions to life through the magic of video.
Quick start guide
Download the prompts (short videos, long videos)
Start with existing video diffusion models to automatically generate the videos
Try some new ideas!
Submit your generated videos using this form: LOVEU-T2V Registration & Submission Form
Dates
May 15, 2024: The competition data become available.
May 21, 2024: The leaderboard and submission instructions become available.
June 8, 2024: Deadline for submitting your generated videos.
June 17, 2024: LOVEU 2024 Workshop. Presentations by winner and runner-up. $2000 prize will be paid to the winning team.
Evaluation method
To participate in the contest, you will submit the videos generated by your model. As you develop your model, you may want to visually evaluate your results and use automated metrics in VBench to track your progress.
After all submissions are uploaded, we will run a human-evaluation of all submitted videos. Labelers will evaluate videos on the following criteria:
Text Alignment: Alignment between generated videos and text prompts
Spatial Quality: The quality of individual video frames
Temporal Quality: Temporal consistency between frames, motion quality, motion strengths
We will choose a winner and a runner-up based on both automatic scores and human evaluation scores.
Dataset
Our LOVEU-T2V-2024 dataset consists of 240 prompts spanning diverse categories/dimensions:
Short video generation: 200 prompts sampled from VBench dataset.
Long video generation: 40 prompts sourced from Sora website.
2Rules
We strongly encourage participants to contribute to the open-source community by sharing their solutions. However, we acknowledge that certain circumstances, such as commercial constraints, may preclude the release of code. In such cases, participants may submit their results alone, although we emphasize the value of openness and collaboration in advancing the field of text-to-video generation.
Be sure to follow the instructions in GitHub repo when saving your generated videos. This will help you get the right format and folder structure.
To submit your results, simply upload a .zip file and fill out the required information in the LOVEU-T2V Registration & Submission Form.
If you have any questions, please feel free to reach out to us at loveu-t2v@googlegroups.com.
Report format
In your report, please explain clearly:
Your data, supervision, and any pre-trained models
Pertinent hyperparameters such as classifier-free guidance scale
If you used prompt engineering, please describe your approach
The report can be simple (1 page) or detailed (many pages). The report should be in PDF format.
FAQ
Q: Is prompt engineering allowed?
A: Yes! If you want to add “high quality, 4k” or things like that to the prompts, you are welcome to do that!
Q: How to submit the results?
A: Please follow the instructions and upload you results (in a single zip file) via the LOVEU-T2V Registration & Submission Form.
Q: Do you have any hints?
A: There are many open-sourced video diffusion models you may start with, e.g., Show-1, Stable Video Diffusion, VideoCrafter, LaVie, etc.