This is the website for the EMNLP 2022 Paper: When FLUE Meets FLANG: Benchmarks and Large Pretrained Language Model for Financial Domain
This is the website for the EMNLP 2022 Paper: When FLUE Meets FLANG: Benchmarks and Large Pretrained Language Model for Financial Domain
Pre-trained language models have shown impressive performance on a variety of tasks and domains. Previous research on financial language models usually employs a generic training scheme to train standard model architectures, without completely leveraging the richness of the financial data. We propose a novel domain specific Financial LANGuage model (FLANG) which uses financial keywords and phrases for better masking, together with span boundary objective and in-filing objective. Additionally, the evaluation benchmarks in the field have been limited. To this end, we contribute the Financial Language Understanding Evaluation (FLUE), an open-source comprehensive suite of benchmarks for the financial domain. These include new benchmarks across 5 NLP tasks in financial domain as well as common benchmarks used in the previous research. Experiments on these benchmarks suggest that our model outperforms those in prior literature on a variety of NLP tasks.
FLANG is a set of large language models for Financial LANGuage tasks. These models use domain specific pre-training with preferential masking to build more robust representations for the domain. The models in the set are:
FLANG-BERT: https://huggingface.co/SALT-NLP/FLANG-BERT
FLANG-SpanBERT: https://huggingface.co/SALT-NLP/FLANG-SpanBERT
FLANG-DistilBERT: https://huggingface.co/SALT-NLP/FLANG-DistilBERT
FLANG-Roberta: https://huggingface.co/SALT-NLP/FLANG-Roberta
FLANG-ELECTRA: https://huggingface.co/SALT-NLP/FLANG-ELECTRA
FLUE (Financial Language Understanding Evaluation) is a comprehensive and heterogeneous benchmark that has been built from 5 diverse financial domain specific datasets. The tasks are described below:
Financial Sentiment Analysis
Financial PhraseBank (Classification): https://huggingface.co/datasets/financial_phrasebank
FiQA 2018 Task-1 (Regression): https://huggingface.co/datasets/SALT-NLP/FLUE-FiQA
News Headline Classification: https://www.kaggle.com/datasets/daittan/gold-commodity-news-and-dimensions
Named Entity Recognition: https://paperswithcode.com/dataset/fin
Structure Boundary Detection: https://sites.google.com/nlg.csie.ntu.edu.tw/finweb2021/shared-task-finsbd-3
Question Answering: https://huggingface.co/datasets/SALT-NLP/FLUE-FiQA
The code can be found here: https://github.com/SALT-NLP/FLANG
Please cite the work with the following citation:
@INPROCEEDINGS{shah-etal-2022-flang, author = {Shah, Raj Sanjay and Chawla, Kunal and Eidnani, Dheeraj and Shah, Agam and Du, Wendi and Chava, Sudheer and Raman, Natraj and Smiley, Charese and Chen, Jiaao and Yang, Diyi }, title = {When FLUE Meets FLANG: Benchmarks and Large Pretrained Language Model for Financial Domain}, booktitle = {Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP)}, year = {2022}, publisher = {Association for Computational Linguistics}}Please contact Raj Sanjay Shah (rajsanjayshah@gatech.edu) or Sudheer Chava (schava6@gatech.edu) or Diyi Yang (diyiy@stanford.edu) about any issues and questions