Tracks and Evaluation Criteria

Tracks

The ICIP 2023 PCVQA Challenge consists of 5 tracks. The tracks correspond to different use cases in which quality metrics are typically used:

Full-reference, broad-range quality estimation: This track aims to assess the perceptual fidelity of distorted contents with respect to the originals for any level of distortion. This is the most generic and traditional set-up for quality metrics.

No-reference, broad-range quality estimation: This track is similar to Track (1) but the proposed metrics do not have access to the original content.

Full-reference, high-quality range: This track focuses on metrics for high-end quality. These are desirable in applications such as content production, high-quality streaming, digital twins, etc.

No-reference, high-quality range: This track is similar to Track (3), but metrics can use only processed point clouds without the originals.

Intra-reference: The metrics should be sensible to quality differences within different processed versions of the same point cloud content. Metrics in this track are especially suitable to optimization scenarios, e.g., for point cloud compression and enhancement, and more in general as loss functions in end-to-end PC learning pipelines.

Different performance criteria will be used to rank methods in each track (see below). Each team can participate to one or more tracks. A leaderboard per track will be kept updated as new submissions are received.

Track #1 - FR broad-range quality estimation

Track #3 - FR high-range quality estimation

Track #2 - NR broad-range quality estimation

Track #4 - NR high-range quality estimation

Track #5 - Intra-reference quality estimation

Evaluation Criteria

Due to differences in use-cases, we utilize a different evaluation criteria for each track. Here is a quick introduction of the criteria that will be used.

A fitting function will not be applied before the evaluation. No-Reference (NR) and Full-Reference (FR) models will be evaluated separately.

SROCC: Spearman Rank Order Correlation Coefficient
PLCC: Pearson Linear Correlation Coefficient
D/S AUC: Difference/Similar Analysis quantified by Area Under the Curve [Krasula, 2016]
B/W CC: Better/Worse Analysis quantified by Correct Classification percentage [Krasula, 2016]
RC: Runtime Complexity

[Krasula, 2016] Krasula, Lukáš, et al. "On the accuracy of objective image and video quality models: New methodology for performance evaluation." 2016 Eighth International Conference on Quality of Multimedia Experience (QoMEX). IEEE, 2016.

And below, you can find the use-case specific criteria combinations:

Generic Use Case - Full Quality Range (Tracks #1 and #2):

Diff / Sim AUC
Better / Worse CC
SROCC
PLCC
Runtime (only in the final evaluation)

Generic Use Case - High Quality Range (Tracks #3 and #4):

Diff / Sim AUC
Better / Worse CC
SROCC
PLCC
Runtime (only in the final evaluation)

Intra-reference (Track #5 - Compression Recipe Optimization Use Case):

Diff / Sim AUC
Better / Worse CC
Runtime (only in the final evaluation)

Final evaluation: The top 5 submissions (depending on the number of total submissions, this number might be increased) in each category (NR and FR) will be shortlisted for final evaluation on the test set. The teams will submit their models following the instructions provided here: [link to the instructions here or via the button below]. In the final evaluation phase, Runtime Complexity will also be taken into account. A ranking system (similar to Borda Count) will be used to select the best model in each category. Models will be ranked based on the criteria described above. After ranking, the models with ranking [1, 2, 3, 4, 5] will receive [4, 3, 2, 1, 0] points respectively for each criteria. Then for each category (NR and FR), the models will be ranked based on the collected points.

A computer with a GPU will be used to run the code and evaluate the methods. The hardware characteristics of the machine will be made available at the beginning of the challenge.

Docker File Submission Guideline