Each participating team will initially have access to the training data only. The unlabelled test data will be released following EvalITA 2023 timeframe. After the assessment, the labels for the test data will be released as well.
In the literature there is no consensus on what metrics should be used for Inter Annotator Agreement (IAA) when dealing with continuous variables, as is the case with EmoITA dataset. We will adopt the most common ones used in former studies on EmoBank, Pearson's r (which is a measure of correlation) and Mean Absolute Error. Pearson's r will be the main measure for ranking.
For both tasks we will provide baselines computed fine-tuning a BERT checkpoint.
As a reference, to the best of our knowledge SOA for dimensional analysis using EmoBank (english version) has been obtained by Park et al. (2021). They claim they reached a Pearson’s r of 0.838 for Valence, of 0.573 for Arousal and of 0.536 for Dominance.
S. Park, J. Kim, S. Ye, J. Jeon, H.Y. Park, A. Oh, Dimensional Emotion Detection from Categorical Emotion, in M.F. Moens, X. Huang, L. Specia, S.W. Yih (Eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2021, pp. 4367-4380.