Saliency or semantic based encoding is a well-established approach in video compression that allocates higher quality to semantically important regions while conserving bits in less critical areas. A recent key innovation has been the significant progress in semantic analysis and segmentation of images and videos, opening new opportunities for encoding based on deeper scene understanding. Models such as Segment Anything and Grounding DINO provide strong foundations for utilizing semantic information to improve video encoding efficiency. Saliency/semantic driven asymmetric encoding enables substantial bitrate savings while maintaining a comparable quality viewing experience for end users.
Even though saliency/semantic driven video encoding is widely adopted, its optimization remains challenging. The asymmetry in encoding must not disrupt natural viewing exploration or introduce visible artifacts. Current state-of-the-art video quality metrics struggle in these scenarios, as most have been trained only on videos that were encoded symmetrically. While saliency-weighted metrics exist, they often face limitations due to the neglection of the encoding artifact impact on visual attention deployment.
To address the need for video quality metrics (VQM) suited to accurately measure the asymmetrically encoded videos, we invite the research community to participate and submit novel or improved VQM models for objectively predicting video quality in both full-reference and no-reference use cases. A dataset named Sport-ROI with human subjective quality scores (as ground truth) will be shared to facilitate VQM model training and testing. The challenge will focus on predicting video quality of videos with various degrees of compression, scaling artifacts, and different asymmetric encoding settings (both semantic and saliency-based encoding). Ideally, the submitted VQM models should provide accurate visual quality prediction for both symmetrically and symmetrically encoded videos. To facilitate this, the shared dataset will also include symmetric encoding samples.