We will make available a new HDR and SDR video dataset (with human subjective quality scores as ground truth) to challenge participants for testing their proposed solutions. Challenge participants can choose to compete in one or both categories below:
(1) Full-Reference (FR) track - Objective video quality measurement algorithms/models predict video quality scores with reference/access to both the original source video and the distorted video
(2) No-Reference (NR) track - Objective video quality measurement algorithms/models predict video quality scores with only access to the compressed video (NO access to the original source video)
The participant teams are required to submit an easily readable code of their algorithm to generate the video quality score prediction in the range from 0 to 10 (float point) for evalution on the test set. The evalution result will be published on a survey paper co-authored by the winning team. The participant team can optionally submit a separate paper describing their solution.
The model can be trained on the released dataset or/and other avalible dataset.
The model must work on both SDR (BT.709, yuv420p) and HDR10 (BT.2020, PQ, yuv420p10le) videos.
One team can participate both FR and NR tracks, however a single team can only win one highest prize.
VMAF-4K-NEG will be used as baseline performance reference for FR track. P.1204.3 will be used for NR track. Amazon reserves the right of NOT giving prize to models below the baseline performance.
All submissions will be evaluated using the following main criteria:
(1) Spearman’s Rank Order Correlation Coefficient (SROCC).
(2) Pearson Linear Correlation Coefficient (PLCC).
(3) Root Mean Square Error (RMSE).
(4) Runtime Complexity (measured by frames processed per second for a given spatial resolution and bit-depth): all submitted solutions will be evaluated on a standard AWS EC2 instance (G6e.2xlarge) with one Nvidia L40S GPU.
The correlation performance will be be tested with a different reserved video dataset than that provided. The VQM performance of the algorithm will be evaluated separately for the full-reference and no-reference use cases. Only SROCC will be used to benchmark the result for the final ranking. PLCC, RMSE and runtime complexity will be published in the paper for reference but not for the ranking.
Amazon will provide cash prizes for the top 3 competition winners (for the two categories FR and NR).
Cash prize:
First place: $3,000
Second place: $2,000
Third place: $1,000
You can download the training dataset once registered for this grand challenge. Please use this link to register.
The HDR & SDR video quality dataset for this challenge was created by the Image and Video Engineering (LIVE) lab at University of Texas Austin (sponsored by Amazon Prime Video).
Our database contains 54 pristine high-quality source videos. These videos include 31 open-source videos from the 8K HDR AVT-VQDB-UHD-2-HDR dataset [1], 10 Video on Demand (VoD) videos and 10 Live Sports videos from Amazon Prime Video’s internal source, and 3 anchor videos from the LIVE HDRorSDR database [2]. All videos are in the BT.2020 color gamut and are quantized using the PQ Optical-Electronic Transfer Function (OETF). Each video sequence has a duration of approximately 7 seconds and includes static metadata of HDR10 standard.
All HDR10 source contents, except for the VoD videos, were converted to SDR format using the publicly available NBCU Lookup Tables (LUT). For VoD videos, both HDR and SDR versions were expertly created by Amazon Studios. We introduced eight distinct levels of distortion applied to both HDR and SDR formats, in addition to one high-quality reference video. For encoding, we utilized the libx265 encoder in FFmpeg, operating in constant bitrate mode with single-pass encoding. Sequences for each video content comprise 18 variations (9 HDR10 and 9 SDR) with distorted bitrate ladders shown below, while the 3 anchor contents consist of 14 variations each, following the structure of the LIVE HDRorSDR database [2]. Together with the anchor contents, the database contains a total of 960 videos, equally divided between HDR10 (480 videos) and SDR (480 videos).
Our experiments were conducted using 6 different 65" HDR10-compatible TVs to ensure a diverse evaluation across a range of display technologies. These TVs include the Samsung QN90B QLED (TV1), Samsung S95C OLED (TV2), Samsung CU8000 (TV3), TCL QM8/QM851G QLED (TV4), TCL Q7/Q750G QLED (TV5), and Vizio M6 Series Quantum 2022 (TV6). A detailed comparison of the TVs, including metrics such as peak brightness, BT.2020 color gamut coverage, and other key specifications, is presented in Table below, providing a comprehensive overview of their HDR performance characteristics. Filmmaker mode is enabled for each TV to adapt the HDR and SDR videos.
A total of 145 participants were recruited from the public to complete the study at the LIVE Lab at the University of Texas at Austin. All participants were required to wear glasses during the study if they normally use them, ensuring consistent visual conditions and accurate representation of their usual viewing experiences. Ultimately, 19 participants completed the study on TV1, 26 on TV2, 21 on TV3, 26 on TV4, 26 on TV5, and 27 on TV6, ensuring comprehensive data collection across all tested devices.
Our study design employed a pairwise comparison (PC) approach to remove potential bias introduced by TVs or environments and to recover high quality score. We followed the guidelines outlined in ITU-R BT 500.13, in combination with the Active Sampling for Pairwise Comparisons (ASAP) algorithm [3], to efficiently and accurately assess subjective video quality.
We utilized the pwcmp algorithm [4] to perform pairwise scaling, transforming the raw pairwise comparison data into just-objectionable-differences (JODs) for each video. The difference of 1 JOD indicates that 75% of observers selected one condition as better than the other. The pwcmp algorithm employs a maximum likelihood estimation framework to infer a continuous quality scale based on the comparison results. This method accounts for the probabilistic nature of human judgments, incorporating inconsistencies and uncertainties in the pairwise data.
In this ICME Grand Challenge, we only utilize the 31 open-sourced contents. 20 open-sourced videos, along with their processed video sequences (PVS), will be provided for training, while the remaining 11 videos and their PVS will be reserved for testing. The training videos comprise a total of 360 videos (180 HDR and 180 SDR), while the testing videos include 198 videos (99 HDR and 99 SDR). The videos with 3840x2160 and 50Mbps ladder are used as the reference videos in FR track.
[1] Dominik Keller, Thomas Goebel, Valentin Siebenkees, Julius Prenzel, and Alexander Raake. Avtvqdb-uhd-2-hdr: An open 8k hdr source dataset for video quality research. In 2024 16th International Conference on Quality of Multimedia Experience (QoMEX), pages 186–192, 2024.
[2] Joshua P Ebenezer, Zaixi Shang, Yixu Chen, Yongjun Wu, Hai Wei, Sriram Sethuraman, and Alan C Bovik. Hdr or sdr? a subjective and objective study of scaled and compressed videos. IEEE Transactions on Image Processing, 2024.
[3] Aliaksei Mikhailiuk, Clifford Wilmot, Maria Perez-Ortiz, Dingcheng Yue, and Rafal K Mantiuk. Active sampling for pairwise comparisons via approximate message passing and information gain maximization. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 25592566. IEEE, 2021.
[4] Maria Perez-Ortiz and Rafal K Mantiuk. A practical guide and software for analysing pairwise comparison experiments. arXiv preprint arXiv:1712.03686, 2017.
Copyright Notice
-----------COPYRIGHT NOTICE STARTS WITH THIS LINE------------
Copyright (c) 2025 The University of Texas at Austin
All rights reserved.
Permission is hereby granted, without written agreement and without license or royalty fees, to use, copy, modify, and distribute this database (the videos, the results and the source files) and its documentation for any purpose, provided that the copyright notice in its entirety appear in all copies of this database, and the original source of this database, Laboratory for Image and Video Engineering (LIVE, http://live.ece.utexas.edu) at the University of Texas at Austin (UT Austin, http://www.utexas.edu ), is acknowledged in any publication that reports research using this database.
IN NO EVENT SHALL THE UNIVERSITY OF TEXAS AT AUSTIN BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OF THIS DATABASE AND ITS DOCUMENTATION, EVEN IF THE UNIVERSITY OF TEXAS AT AUSTIN HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
THE UNIVERSITY OF TEXAS AT AUSTIN SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE DATABASE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND THE UNIVERSITY OF TEXAS AT AUSTIN HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
-----------COPYRIGHT NOTICE ENDS WITH THIS LINE------------