Task Definition

1. Earnings Call Subtask [1]

Objective: Assess the argument quality in earnings call transcripts based on four linguistic dimensions.
Dataset: Reused from FinArg-1 and FinArg-2.
Task Type: Multi-label classification (4 labels), evaluated using F1-score.
Four Dimensions:
- Specificity
  - 0: Off-topic or irrelevant
  - 1: Somewhat related, vague or hedged
  - 2: Highly specific and directly responsive
- Strength
  - 0: Weak support (e.g., single doubtful premise)
  - 1: Moderate support (2+ reasonable premises)
  - 2: Strong support (e.g., factual/statistical evidence)
- Persuasiveness
  - 0: Unclear or poorly structured
  - 1: Moderately convincing but flawed
  - 2: Clear, well-structured, broadly convincing
- Objectivity
  - 0: Subjective or biased
  - 1: Objective, based on verifiable data or logic

2. Analyst Report Subtask [2,3]

Objective: Evaluate whether forecast scenarios proposed in analyst reports are likely to come true.
Dataset: Annotated scenarios from FinArg-1, verified via news sources and Google Search.
Labels:
- Completely True
- Partially True
- Completely False
Task Type: 3-class classification, evaluated using F1-score

3. Social Media Subtask [2,4,5]

Objective: Compare pairs of Chinese social media posts to determine which one demonstrates higher-quality causal reasoning for better decision-making.
Dataset: We shared the posts with maximum possible profit (MPP) and maximum loss (ML) labels.
Offline Evaluation:
1. Given two posts, select the one would lead to higher MPP (Binary classification), evaluated using accuracy
Real-Time Evaluation:
1. During the evaluation phase, each participating team will receive 10 new post pairs per day for 5 consecutive days
2. Teams must return their predictions within 24 hours
3. Real-time responses will be used to assess model consistency, latency handling, and generalization on fresh unseen data
4. Metric: Average accuracy

[1] Alaa Alhamzeh. Financial Argument Quality Assessment in Earnings Conference Calls. In International Conference on Database and Expert Systems Applications, pp. 65-81. 2023.

[2] Chung-Chi Chen, Hen-Hsen Huang, Hsin-Hsi Chen, Hiroya Takamura, Ichiro Kobayashi, Yusuke Miyao. 2025. Enhancing Investment Opinion Ranking through Argument-Based Sentiment Analysis. In Proceedings of the International Joint Conference on Natural Language Processing & Asia-Pacific Chapter of the Association for Computational Linguistics (IJCNLP-AACL-2025)

[3] Chin-Yi Lin, Chung-Chi Chen, Hen-Hsen Huang, and Hsin-Hsi Chen. 2024. Argument-Based Sentiment Analysis on Forward-Looking Statements. In Findings of the Association for Computational Linguistics: ACL 2024

[4] Chung-Chi Chen, Hiroya Takamura, Ichiro Kobayashi and Yusuke Miyao. 2024. Professionalism-Aware Pre-Finetuning for Profitability Ranking. In Proceedings of The 33rd ACM International Conference on Information and Knowledge Management (CIKM'24)

[5] Chung-Chi Chen, Hen-Hsen Huang, and Hsin-Hsi Chen. 2021. Evaluating the Rationales of Amateur Investors. In Proceedings of The Web Conference 2021 (WWW'21)

License

The annotated dataset is licensed under the Creative Commons Attribution-Non-Commercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.

Report abuse