One team can participate both FR and NR tracks, however a single team can only win one highest prize
The participant teams are required to submit their algorithm to generate the video quality score prediction in the range from 0 to 100 (float point) for evaluation on the test set. Each team also need to include a document describing the solution as detailed as possible in no more than 4 pages using the QoMEX format, results, software, etc.
The model can be trained on the released dataset or/and other available dataset, training detail need to be provided.
VMAF will be used as baseline performance reference for FR track. P.1204.3 will be used for NR track. Amazon reserves the right of NOT giving prize to models below the baseline performance.
Participants in GC are encouraged to submit short QoMEX papers (up to 3 pages of content + 1 page reference) corresponding to the GC solutions that, in case of acceptance, will be published in the conference proceedings. The rejection of the submitted QoMEX GC paper does not necessarily exclude the corresponding solution from assessment since the criteria involved are different, e.g. novelty is critical for the paper while performance may be more critical for the solution.
Full Full-Reference VQM - Objective video quality measurement models predict video quality scores with reference/access to both the original source video and the distorted video
No-Reference VQM - Objective video quality measurement models predict video quality scores with only access to the compressed video (NO access to the original source video)
The dataset was designed to support the evaluation of video quality metric for asymmetric encoded videos under diverse content and motion conditions. Source sequences were selected from the CDVL video library, focusing on sports content.
All sequences are provided at 1080p resolution with frame rates ranging from 25 to 50 frames per second.
Two ways are used to generate asymmetric videos:
Semantic-based encoding, videos were segmented using Grounding DINO [1] to obtain frame-level object masks for semantically relevant regions, where regions are categorized into two importance groups:
High-importance regions: players, ball, field lines, and referees
Low-importance regions: background field
Field lines are included as high-importance regions due to their role in supporting viewers’ understanding of the game. Encoded versions are provided at 1080p and 540p, with different QP offsets applied to high- and low-importance regions.
Saliency-based encoding, where importance is derived from predicted visual saliency rather than fixed semantic classes. Saliency maps are generated using the STSANet [2] video saliency prediction model, trained on the UCF Sports dataset. Unlike semantic-based coding, this approach allows different players within the same scene to receive different levels of coding priority based on their visual prominence.
The encoded videos were evaluated in a standardized laboratory environment following the recommendations of ITU-R BT.500, with controlled lighting conditions. The display was a 65-inch 4K OLED Sony 65A95K, and the viewing distance was set to 1.5H. Subjective quality assessment was conducted using a time-parallel Degradation Category Rating (DCR) protocol. Owing to the 4K display resolution, the reference and processed HD videos were presented side-by-side, with gray borders at the top and bottom. You can download the training dataset once registered for this grand challenge.
[1] Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, et al., “Grounding dino: Marrying dino with grounded pre-training for open-set object detection,” arXiv preprint arXiv:2303.05499, 2023.
[2] Z. Wang, Z. Liu, G. Li, Y. Wang, T. Zhang, L. Xu, and J. Wang, “Spatio-temporal self-attention network for video saliency prediction,” IEEE Trans. on Multimedia, vol. 25, pp. 1161–1174, 2023.
Copyright Notice
-----------COPYRIGHT NOTICE STARTS WITH THIS LINE------------
Copyright (c) 2026 Nantes University
All rights reserved.
Permission is hereby granted, without written agreement and without license or royalty fees, to use, copy, modify, and distribute this database (the videos, the results, and the source files) and its documentation for any purpose, provided that the copyright notice in its entirety appears in all copies of this database, and that the original source of this database, Nantes University, is acknowledged in any publication that reports research using this database.
IN NO EVENT SHALL NANTES UNIVERSITY BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OF THIS DATABASE AND ITS DOCUMENTATION, EVEN IF NANTES UNIVERSITY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
NANTES UNIVERSITY SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE DATABASE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND NANTES UNIVERSITY HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
-----------COPYRIGHT NOTICE ENDS WITH THIS LINE------------
All submissions will be evaluated using the following main criteria:
(1) Spearman’s Rank Order Correlation Coefficient (SROCC).
(2) Pearson Linear Correlation Coefficient (PLCC).
(3) Root Mean Square Error (RMSE).
(4) D/S AUC: Difference/Similar Analysis quantified by Area Under the Curve [Krasula, 2016]
(5) B/W CC: Better/Worse Analysis quantified by Correct Classification percentage [Krasula, 2016]
(6) Runtime Complexity (measured by frames processed per second for a given spatial resolution and bit-depth): all submitted solutions will be evaluated on a standard AWS EC2 instance (G6e.2xlarge) with one Nvidia L40S GPU.
The correlation performance will be be tested with a different reserved video dataset than that provided. The VQM performance of the algorithm will be evaluated separately for the full-reference and no-reference use cases.
[Krasula, 2016] Krasula, Lukáš, et al. "On the accuracy of objective image and video quality models: New methodology for performance evaluation." 2016 Eighth International Conference on Quality of Multimedia Experience (QoMEX). IEEE, 2016.
Amazon will provide cash prizes for the top 3 competition winners (for the two categories FR and NR).
Cash prize:
First place: $3,000
Second place: $2,000
Third place: $1,000