When comparing the performance of various regression models, one faces several challenges:
1) How to correctly aggregate the errors across the full range of the predicted values: does a 0.1 wt.% error at a nominal concentration value of 0.2 wt.% have the same impact as the same 0.1 wt.% error at a nominal concentration of 20 wt.%? Probably not.
2) How to compare the prediction errors across distinct elements. This is an especially important question if the nominal concentration ranges of the distinct elements differ widely.
Naturally, while answering these questions, one must take into account the target application of the regression models. However, the LIBS 2022 regression contest was not designed with a practical application in mind but rather to provide a platform to compare various distinct regression methodologies. As such, we get no help from here.
Considering the points above, the following figures and the accompanying text present a brief demonstration of the impact of the evaluation metric's choice on the final results. Note that the results below serve as an extension of the official evaluation metric of the contest defined at the launch of the contest. As such, the results below do not affect the final ranking.
About the evaluation metrics
The two points presented above are addressed at two separate stages of the evaluation: firstly, the regression model's performance is evaluated separately on the four considered elements (Ni, Cr, Mn, Mo) using an error metric (see below). Secondly, the six error metrics are aggregated into a single value--the performance metric--which is then used to compare the contestants' performance, i.e., to provide a ranking. Naturally, the choice of the error metric influences the performance metric.
As such, the following figures consider 6 error metrics:
Root mean squared error (RMSE);
Mean absolute error (MAE);
Root mean square relative error (RMSRE);
Mean absolute relative error (MARE);
Relative root mean square error (RRMSE);
Relative absolute error (RMAE).
Details of the individual error metrics--such as their definitions and motivation--are provided below. Subsequently, these 6 error metrics are combined using 6 distinct approaches. Thus, 36 distinct rankings are obtained.
A standard metric used to evaluate the prediction error of regression models.
Another frequently used error metric. Owing to the absolute value replacing the squares, MAE is less sensitive to outliers.
A modification of the RMSE metric. Dividing the error values by the nominal values places equal weight on each prediction.
A modification of the MAE metric. Dividing the error values by the nominal values places equal weight on each prediction.
A modification of the RMSE metric. Dividing the MSE value by the sum of squares of the nominal values aims at improving the comparability of error values obtained from different elements.
A modification of the MAE metric. Dividing the MAE value by the sum of the nominal values aims at improving the comparability of error values obtained from different elements.