Phase II

Evaluation Results

Evaluation Criteria for the Test Phase

Here, the final results are presented, both in the form of ranking and numerical results. These are the results of the performance metrics (as described in Track and Evaluation Criteria page) on the Test set that none of the participants had access to. Specifically, these metrics are:

SROCC: Spearman Rank Order Correlation Coefficient
PLCC: Pearson Linear Correlation Coefficient
D/S AUC: Difference/Similar Analysis quantified by Area Under the Curve [Krasula, 2016]
B/W CC: Better/Worse Analysis quantified by Correct Classification percentage [Krasula, 2016]
RC: Runtime Complexity (i.e., average runtime duration per processed point cloud - on the same PC configuration)

To find rankings for individual metrics, statistical significance tests have been applied. Here are the descriptions how rankings were obtained for different metrics:

SROCC & PLCC: The 95% confidence intervals are found using Bootstrapping with 1000 iterations, and the extent of the confidence intervals showed whether different algorithms have statistically significant difference.
D/S AUC & B/W CC: For these metrics, the significance output of the method is used. Please refer to [Krasula, 2016] for details.
Runtime Complexity: No specific tests were done, the values are ranked with respect to numerical values.

The results are presented below as a table for each track. The teams are ordered according to their final rankings. The scores for each evaluation criteria and rankings for each criteria are also presented.

[Krasula, 2016] Krasula, Lukáš, et al. "On the accuracy of objective image and video quality models: New methodology for performance evaluation." 2016 Eighth International Conference on Quality of Multimedia Experience (QoMEX). IEEE, 2016.

TRACK #1 - Full Reference Broad-Range Quality Estimation Results

During the development phase, we received submissions from 9 unique teams for Track #1.
4 of these teams succesfully submitted their final model for the test phase.