FinNum - Data
Data
Test set: FinNum_test.json
Training set: FinNum_training.json
Development set: FinNum_dev.json
ReadMe of rebuild dataset: How to rebuild FinNum dataset.pdf
Code for rebuild dataset: rebuild_FinNum.py
Please cite the following paper when referring to the FinNum dataset in academic publications and papers.
Chung-Chi Chen, Hen-Hsen Huang, Yow-Ting Shiue, and Hsin-Hsi Chen. 2018. Numeral Understanding in Financial Tweets for Fine-grained Crowd-based Forecasting. In Proceedings of the 2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2018), Santiago, Chile, pages 136-143. pdf
Data Format
In provided dataset, participants will get "idx" (the index of the tweet), "id" (the id of the tweet), "target_num" (the target numeral), "category" (annotated result), "subcategory" (annotated result), and should rebuild the data via Stocktwits API. Note that, three categories (Indicator, Quantity, and Product/ Version number) do not have subcategory. Thus, the category and subcategory information are the same for these three categories.
Example:
[{
'idx': 7791,
'id': 100382304,
'target_num': ['10'],
'category': ['Monetary'],
'subcategory': ['forecast'],
'tweet': '$FLKS Cantor Fitzgerald reiterates Hold rating, $10 PT => https://stocknews.com/news/flks-cantor-fitzgerald-reiterates-hold-rating-10-pt/ #FlexPharma #Biotechnology #BioTech #Bullish #Stock #FLKS'
},
{
'idx': 5720,
'id': 102960935,
'target_num': ['28', '2017'],
'category': ['Temporal', 'Temporal'],
'subcategory': ['date', 'date'],
'tweet': '$BTL BTL - Upward momentum Long from $8.70 or $8.05 . \n* Trade Criteria * \nDate First Found- November 28, 2017\nPattern/'
}]
License
The annotated dataset is licensed under the Creative Commons Attribution-Non-Commercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.