Objective: Assess the argument quality in earnings call transcripts based on four linguistic dimensions.
Dataset: Reused from FinArg-1 and FinArg-2.
Task Type: Multi-label classification (4 labels), evaluated using F1-score.
Four Dimensions:
Specificity
0: Off-topic or irrelevant
1: Somewhat related, vague or hedged
2: Highly specific and directly responsive
Strength
0: Weak support (e.g., single doubtful premise)
1: Moderate support (2+ reasonable premises)
2: Strong support (e.g., factual/statistical evidence)
Persuasiveness
0: Unclear or poorly structured
1: Moderately convincing but flawed
2: Clear, well-structured, broadly convincing
Objectivity
0: Subjective or biased
1: Objective, based on verifiable data or logic
Objective: Evaluate whether forecast scenarios proposed in analyst reports are likely to come true.
Dataset: Annotated scenarios from FinArg-1, verified via news sources and Google Search.
Labels:
Completely True
Partially True
Completely False
Task Type: 3-class classification, evaluated using F1-score
Objective: Compare pairs of Chinese social media posts to determine which one demonstrates higher-quality causal reasoning for better decision-making.
Dataset: Built from FinArg-1 and FinArg-2, includes 574 posts with 164,451 pairwise combinations.
Task Type: Binary classification, evaluated using accuracy (Offline Evaluation)
Real-Time Evaluation:
During the evaluation phase, each participating team will receive 10 new post pairs per day for 5 consecutive days
Teams must return their predictions within 24 hours
Real-time responses will be used to assess model consistency, latency handling, and generalization on fresh unseen data
Metric: Average accuracy
[1] Alaa Alhamzeh. Financial Argument Quality Assessment in Earnings Conference Calls. In International Conference on Database and Expert Systems Applications, pp. 65-81. 2023.Â
[2] Chin-Yi Lin, Chung-Chi Chen, Hen-Hsen Huang, and Hsin-Hsi Chen. 2024. Argument-Based Sentiment Analysis on Forward-Looking Statements. In Findings of the Association for Computational Linguistics: ACL 2024
[3] Chung-Chi Chen, Hiroya Takamura, Ichiro Kobayashi and Yusuke Miyao. 2024. Professionalism-Aware Pre-Finetuning for Profitability Ranking. In Proceedings of The 33rd ACM International Conference on Information and Knowledge Management (CIKM'24)
[4] Chung-Chi Chen, Hen-Hsen Huang, and Hsin-Hsi Chen. 2021. Evaluating the Rationales of Amateur Investors. In Proceedings of The Web Conference 2021 (WWW'21)
The annotated dataset is licensed under the Creative Commons Attribution-Non-Commercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.