Evaluation

We have multiple metrics to measure systems's behaviour arranged in two main categories: performance and efficiency.

Performance metrics

This type of metric is intended to measure how well systems achieve the proposed task in terms of quality. Several metrics are proposed, being Levenshtein metric the main one for the final ranking of systems. Here is a summarized list of the additional performance metrics that will be calculated for each submission:

Word Error Rate (WER). Measures errors at the word level, including insertions, deletions, and substitutions.
Sentence Error Rate (SER). Measures the percentage of sentences with at least one error.
Levenshtein Distance. Calculates the minimum number of single-character edits (insertions, deletions, substitutions) needed to transform one text into another.
Normalized Edit Distance (NED). Normalizes the Levenshtein Distance by the length of the ground truth text.
BLEU (Bilingual Evaluation Understudy). Measures the overlap of n-grams (e.g., unigrams, bigrams) between the transcribed text and the reference text, focusing on precision.
ROUGE (Recall-Oriented Understudy for Gisting Evaluation). Measures content overlap between the transcribed text and the reference using n-grams (ROUGE-N), longest common subsequences (ROUGE-L), or skip-grams (ROUGE-S).

Efficiency metrics

Efficiency metrics are intended to measure the impact of the system in terms of resources needed and environmental issues. We want to recognize those systems that are able to perform the task with minimal demand for resources. This will allow us to, for instance, identify those technologies that could run on a mobile device or a personal computer, along with those with the lowest carbon footprint. To this end, each submission (each prediction sent to the server) must contain the following information:

Total RAM needed
Total % of CPU usage
Floating Point Operations per Second (FLOPS)
Total time to process (in milliseconds)
Kg in CO2 emissions. For this, the Code Carbon tool will be used.

A notebook with a sample code to collect this information is here.

Page updated

Google Sites

Report abuse