Results are ordered based on the F1-score, which represents the macro-average F1 over the "FAVOR" and "AGAINST" categories. F1-score is computed for each target separately, and then the overall macro F-score is computed across all targets.