Evaluation criteria
We expect to receive scores close to 1 for genuine comparisons and close to 0 for impostor comparisons.
Once your scores are submitted, we will evaluate the performance considering several metrics, popular in the field of biometrics:
Global Equal Error Rate (EER), used to generate the competition ranking: it is computed by considering the entire distributions of genuine and impostor scores.
False Non-Match Rate (FNMR) at various False Match Rate (FMR) values: FNMR @10%FMR, FNMR @1%FMR, FNMR @0.1%FMR,
Area Under the Curve (AUC)
Accuracy
Per-user mean EER: it is computed by considering user-specific distributions of genuine and impostor scores, in other words, a different threshold per user. By doing so, one EER per user is obtained, then the mean value of all user-specific EERs is considered.
Per-user mean AUC
Per-user mean accuracy
Per-user mean Rank-1
Some curves will be automatically plotted:
Detection Error Tradeoff (DET)
Receiving Operator Characteristic (ROC)
Genuine and Impostor Distribution Score Histograms
Moreover, the biometric fairness will be evaluated according to the following metrics:
Standard deviation (STD) of global accuracy considering different demographic groups
Skewed Error Rate (SER) of global accuracy considering different demographic groups
Fairness Discrepancy Rate (FDR)
Inequity Rate (IR)
Gini Aggregation Rate for Biometric Equitability (GARBE)
Skewed Impostor Rate with respect to age (SIRa) and gender (SIRg)
A complete explanation of the metrics presented is available on arXiv, while their implementation on GitHub.
To check the results in their entirety, click on Detailed Results on the leaderboard (last column).
Below you can find the example of the detailed results returned by the CodaLab scoring platform (LSIA team, winner of the limited-time KVC):