We are excited to announce the second annual 2024 INFORMS Data Mining Society’s Data Challenge. This is an exceptional opportunity to apply your data analytics and machine learning skills to a unique dataset and compete for innovative solutions in the dynamic world of digital media analytics. We invite INFORMS community members to participate and develop your innovative solutions!
Challenge Overview:
In recent years, short videos have emerged as a leading content format across various sectors, experiencing exponential growth. Platforms such as Instagram, YouTube, and Netflix have adapted by introducing their versions of short-form video platforms—Reels, Shorts, and Fast Laughs. Moreover, platforms dedicated to short-form content like TikTok have soared in global popularity. This surge is attributed to the format's convenience, accessibility, and ease of creation. Beyond entertainment, short videos have ventured into marketing, advertising, news production, and education, becoming pivotal in these domains. This shift is expected to foster more creators and expand business opportunities across different sectors. In this evolving media landscape, the ability to understand and predict the popularity of short-form content becomes essential. Accurate popularity prediction models can empower content creators to better tailor their videos, optimize content creation, and enhance viewer engagement. Furthermore, these insights can be instrumental for content creators and marketers to refine video production and advertising strategies, boost brand awareness, and improve conversion rates.
Currently, predicting the popularity of short videos remains a complex and dynamic challenge, due to the scarcity of public datasets and the early state of research methodologies in this field. To address this challenge, we have curated a dataset of short videos with associated meta-information from the most popular short video platform TikTok. This data challenge invites participants to research and develop innovative predictive models that can assess short video popularity through four key engagement metrics: views, hearts (likes), comments, and shares. By understanding these aspects of user engagement and content reach, we aim to foster further research and development in short video popularity prediction, a crucial factor for strategic decision-making in the competitive online social network environment.
Data Challenge Task:
The task of the contestant team is to use relevant features of short videos (e.g., release dates, numbers of authors' followers) to predict their popularity, measured through four key engagement metrics: views, hearts (likes), comments, and shares.
Timeline and Key Dates:
The entire competition will run until September 17th, 2024. During the training phase contestants should use the provided training data to develop their models. The testing phase will be conducted on a holdout dataset where only the competition committee knows the true values. The testing period will be August 26 - Sept 9, and the holdout dataset will be released prior to August 26. The leaderboard, on the Workshop’s website, will be updated once per week. Tuesday, Sept 10 will be the final ranking. We will invite the top four competitors to the INFORMS Data Mining Workshop to present their solutions. We will use Anywhere on Earth time.
How to Participate:
1. Download Data: Access the provided dataset from the attached link. This directory contains all the necessary information about the competition.
2. Develop Solutions: Leverage the rich data to develop your analytical models during the Training Phase.
3. Testing Phase: During the Testing Phase, you will be provided with the holdout test set to evaluate your predictive model.
Evaluation:
For each of the four response variables, we will rank all the teams by the Mean Absolute Percentage Error (MAPE) on the test data. We will then compute the average rank of each team. For instance, if a team ranks 2nd, 1st, 10th, 5th in the four response variables, then the average rank of that team is (2 + 1 + 10 + 5) / 4 = 4.5. Finally, all teams are ranked by the aforementioned average rank.
We will have a ranking leaderboard below.
Submission:
Each team needs to submit their predictions on the test data through this Google form. As we get closer to time, we will provide the testing set as well as a template for predictions.
We will release the ranking results of the four response variables on August 27th, September 3th, September 10th, and September 17th. The first three (on Aug. 27, Sept. 3, and Sept. 10) serve to help contestants improve their models. The final ranking is computed based on the result released on Sep 17. One member in each team needs to be identified as the primary contact and provide their email in the submission form.
All teams invited as finalists are required to submit a 5-page maximum report summarizing their methodologies, due on October 6th, 2024 (anywhere on earth). Winners will be chosen based on the presentations and the reports by a panel of judges.
Prizes and Recognition:
The finalists will be chosen directly based on numerical results using the average ranking across the four response variables. Each finalist team will have one complimentary registration code for the 19th INFORMS Workshop on Data Mining & Decision Analytics (DMDA), but will still need to pay for the main INFORMS conference registration (if attending). If you are invited as one of the four finalists, you will receive a monetary prize. The final selection of winners will be based on the quality of the presentation and the written methodology, judged by our panel of judges.
First Place Prize: 1000 USD
Second Place Prize: 500 USD
Third Place Prize: 250 USD
Fourth Place Prize: 125 USD
Contact and Questions:
For any questions or clarifications, feel free to reach out to the Competition Chairs:
Dr. Hieu Pham, Assistant Professor at The University of Alabama in Huntsville (Email: hieu.pham@uah.edu)
Dr. Kaizheng Wang, Assistant Professor at Columbia University (Email: kw2934@columbia.edu)
Dr. Shouyi Wang, Associate Professor at The University of Texas at Arlington (Email: shouyiw@uta.edu)
We look forward to your active participation and innovative solutions!
Best regards,
Competition Committee
INFORMS Data Mining Society
Testing Data with Labels: HERE
Submission Template: HERE
Submission Form: HERE ( You will need a Google account to submit. If you do not have one, you can also email us your submission.)
Please use the provided submission template for your predictions. Additionally, please name your submission file as "YourTeamNumber.csv" (e.g. "2024.csv" ). For your team number, we have concatenated the first three letters of your team leader's first and last name (e.g. John Smith = JohSmi). Please use this table to identify your team number. If you do not see your team number, please email us.
The rankings will be updated on August 27th, Sept. 3rd, Sept. 10th, and Sept. 17th. Please submit your rankings the day before AOE Time. After the final rankings, the top four teams will be contacted.
Week 1 Ranking
Week 2 Ranking
Week 3 Ranking
Final Ranking