FinNum - Data


Test set: FinNum_test.json

Training set: FinNum_training.json

Development set: FinNum_dev.json

ReadMe of rebuild dataset: How to rebuild FinNum dataset.pdf

Code for rebuild dataset:

Please cite the following paper when referring to the FinNum dataset in academic publications and papers.

Chung-Chi Chen, Hen-Hsen Huang, Yow-Ting Shiue, and Hsin-Hsi Chen. 2018. Numeral Understanding in Financial Tweets for Fine-grained Crowd-based Forecasting. In Proceedings of the 2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2018), Santiago, Chile, pages 136-143. pdf

Data Format

In provided dataset, participants will get "idx" (the index of the tweet), "id" (the id of the tweet), "target_num" (the target numeral), "category" (annotated result), "subcategory" (annotated result), and should rebuild the data via Stocktwits API. Note that, three categories (Indicator, Quantity, and Product/ Version number) do not have subcategory. Thus, the category and subcategory information are the same for these three categories.



'idx': 7791,

'id': 100382304,

'target_num': ['10'],

'category': ['Monetary'],

'subcategory': ['forecast'],

'tweet': '$FLKS Cantor Fitzgerald reiterates Hold rating, $10 PT => #FlexPharma #Biotechnology #BioTech #Bullish #Stock #FLKS'



'idx': 5720,

'id': 102960935,

'target_num': ['28', '2017'],

'category': ['Temporal', 'Temporal'],

'subcategory': ['date', 'date'],

'tweet': '$BTL BTL - Upward momentum Long from $8.70 or $8.05 . \n* Trade Criteria * \nDate First Found- November 28, 2017\nPattern/'



The annotated dataset is licensed under the Creative Commons Attribution-Non-Commercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.