Vision
The Archimedes LawFinTech Hub bridges finance and law through AI and Large Language Models (LLMs), fostering collaboration and innovation. By developing multilingual, open-source models, we aim to enhance accessibility, inclusivity, and transparency, empowering users worldwide to tackle complex challenges and drive equitable growth in both fields.
Opportunities
Large Language Models (LLMs) are transforming finance and law by enhancing efficiency, accuracy, and adaptability in complex tasks like compliance monitoring, fraud detection, and risk management. Their ability to analyze vast legal and financial data, integrate multimodal inputs, and adapt through transfer learning offers groundbreaking opportunities. By fostering interdisciplinary collaboration, the Archimedes LawFinTech Hub aims to develop innovative, legally compliant frameworks that address technical and regulatory challenges, creating a more cohesive and efficient system for these interconnected fields.
Themes of Research
1. Domain-Specific LLMs
Develop advanced LLMs to interpret numbers, tables, and specialized knowledge in finance and law. These models support tasks like financial analysis, risk assessment, stock prediction, and legal compliance by processing structured data and integrating insights for informed decision-making.
3. Task Alignment and Customization
Enhance LLM capabilities with fine-tuning, domain-specific tasks, and long-context handling. Applications include financial risk analysis, trading, stock prediction, contract review, claims analysis, and more, ensuring tailored, context-aware solutions for interdisciplinary needs.
5. Agent-Based Frameworks
Create agent-based systems with multi-modal inputs (text, images, numerical data) for collaboration on complex tasks. Applications include market analysis, visual evidence evaluation, and hierarchical task management in finance and law, offering a holistic, efficient approach.
2. Legal and Financial Integration
Train LLMs on legal texts and financial data to bridge the gap between law and finance. These models enable nuanced legal language understanding, regulatory compliance, fraud detection, and the analysis of financial transactions with precision and context.
4. Multilingual and Multitask Models
Develop multilingual LLMs to address global legal and financial challenges. Focus on Greek, English, Chinese, Spanish, and Japanese to enhance accessibility, local market insights, and cultural understanding, ensuring inclusivity and global impact.
Challenges to focus on
1. Data Availability and Quality
Accessing and curating high-quality, domain-specific datasets in finance and law remains challenging. Ensuring data accuracy, diversity, and completeness is critical for reliable LLM training and performance.
3. Ethical and Legal Concerns
Balancing innovation with compliance involves addressing ethical issues like data privacy, bias, and legal accountability. Transparent, trustworthy models are crucial for real-world deployment.
5. Interdisciplinary Integration
Bridgthe knowledge gap between finance, law, and AI demands interdisciplinary collaboration. Aligning expertise across domains is vital to create robust, context-aware solutions.
2. Multilingual and Cultural Barriers
Adapting LLMs to different languages and cultural contexts is complex. Developing multilingual models that address regional nuances, legal systems, and financial markets is essential for global applicability.
4. Computational Demands
Training and fine-tuning LLMs require extensive computational resources. Efficiently managing these demands while maintaining model accuracy and scalability is a significant challenge.
Research Environment
Our theme at Archimedes wishes to partner with select international academic and industrial partners to harness Law and FinTech opportunities, by creating a Hub for strategic approaches to a truly interdisciplinary approach for law and fintech that will offer AI-based solutions for interpretable forecasting, trading, stock prediction, decision making, risk assessment. The Hub will offer the infrastructure to address issues of security, privacy and data protection.
International efforts we are currently involved
SIG-FinTech (part of ACL) https://sigfintech.github.io/ https://sigfintech.github.io/finnlp.html. We are co-organising the financial misinformation detection task (Archimedes) https://coling2025fmd.thefin.ai/
The FinAI https://www.thefin.ai/
Forthcoming Workshop
The Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing (FNP), and the 1st Workshop on Large Language Models for Finance and Legal (LLMFinLegal)
In conjunction with COLING-2025, January 19-20 2025, Abu Dhabi, UAE
https://sites.google.com/nlg.csie.ntu.edu.tw/finnlp-fnp-llmfinlegal/home
See shared tasks https://sites.google.com/nlg.csie.ntu.edu.tw/finnlp-fnp-llmfinlegal/shared-tasks
Related publications
Araci, D. (2019). FinBERT: Financial sentiment analysis with pre-trained language models. arXiv preprint arXiv:1908.10063.
Xie, Q., Han, W., Zhang, X., Lai, Y., Peng, M., Lopez-Lira, A., & Huang, J. (2023). PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark for Finance. NeurIPS https://neurips.cc/virtual/2023/poster/73431
Yuxin Wang, Duanyu Feng, Yongfu Dai, Zhengyu Chen, Jimin Huang, Sophia Ananiadou, Qianqian Xie, Hao Wang (2024) HARMONIC: Harnessing LLMs for Tabular Data Synthesis and Privacy Protection. NeurIPS https://neurips.cc/virtual/2024/poster/97571
Qianqian Xie · Weiguang Han · Zhengyu Chen · Ruoyu Xiang · Xiao Zhang · Yueru He · Mengxi Xiao · Dong Li · Yongfu Dai · Duanyu Feng · Yijing Xu · Haoqiang Kang · Ziyan Kuang · Chenhan Yuan · Kailai Yang · Zheheng Luo · Tianlin Zhang · Zhiwei Liu · Guojun Xiong · Zhiyang Deng · Yuechen Jiang · Zhiyuan Yao · Haohang Li · Yangyang Yu · Gang Hu · Huang Jiajia · Xiaoyang Liu · Alejandro Lopez-Lira · Benyou Wang · Yanzhao Lai · Hao Wang · Min Peng · Sophia Ananiadou · Jimin Huang (2024) FinBen: An Holistic Financial Benchmark for Large Language Models, NeurIPS https://neurips.cc/virtual/2024/poster/97525
Raj Shah, Kunal Chawla, Dheeraj Eidnani, Agam Shah, Wendi Du, Sudheer Chava, Natraj Raman, Charese Smiley, Jiaao Chen, and Diyi Yang. 2022. When FLUE Meets FLANG: Benchmarks and Large Pretrained Language Model for Financial Domain. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 2322–2335, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Zhiwei Liu, Xin Zhang, Kailai Yang, Qianqian Xie, Jimin Huang, Sophia Ananiadou (2024) FMDLlama: Financial Misinformation Detection based on Large Language Models. (https://www.arxiv.org/abs/2409.16452)
Shijie Wu, Ozan Irsoy, Steven Lu, Vadim Dabravolski, Mark Dredze, Sebastian Gehrmann, Prabhanjan Kambadur, David Rosenberg, Gideon Mann (2023) BloombergGPT: A Large Language Model for Finance https://arxiv.org/abs/2303.17564
Yangyang Yu, Zhiyuan Yao, Haohang Li, Zhiyang Deng, Yupeng Cao, Zhi Chen, Jordan W. Suchow, Rong Liu, Zhenyu Cui, Zhaozhuo Xu, Denghui Zhang, Koduvayur Subbalakshmi, Guojun Xiong, Yueru He, Jimin Huang, Dong Li, Qianqian Xie (2024) FinCon: A Synthesized LLM Multi-Agent System with Conceptual Verbal Reinforcement for Enhanced Financial Decision Making, https://arxiv.org/abs/2407.06567
Xiao Zhang, Ruoyu Xiang, Chenhan Yuan, Duanyu Feng, Weiguang Han, Alejandro Lopez-Lira, Xiao-Yang Liu, Sophia Ananiadou, Min Peng, Jimin Huang, Qianqian Xie (2024) Dólares or Dollars? Unraveling the Bilingual Prowess of Financial LLMs Between Spanish and English, KDD’24 https://dl.acm.org/doi/10.1145/3637528.3671554
Wang, N., Yang, H., & Wang, C. D. (2023). FinGPT: Instruction tuning benchmark for open-source large language models in financial datasets. arXiv preprint arXiv:2310.04793.
Aman Rangapur, Haoran Wang, Ling Jian, Kai Shu (2023) Fin-Fact: A Benchmark Dataset for Multimodal Financial Fact Checking and Explanation Generation, https://arxiv.org/abs/2309.08793
Y. Dai, D. Feng, J. Huang, H. Jia, Q. Xie, Y. Zhang, W. Han, W. Tian, and H. Wang.(2025) Laiw: A Chinese legal large language models benchmark (a technical report). arXiv preprint arXiv:2310.05620, 2023. (COLING 2025)
Huang, Jiajia, Haoran Zhu, Chao Xu, Tianming Zhan, Qianqian Xie and Jimin Huang (2024) “AuditWen:An Open-Source Large Language Model for Audit.” ArXiv abs/2410.10873 (2024). CCL
N.C. Gkoumas, G.N. Leledakis, E.G. Pyrgiotakis and I. Androutsopoulos, "Bank Competition, Loan Portfolio Concentration and Stock Price Crash Risk: the Role of Tone Ambiguity". British Journal of Management, 2024.
A.G. Katsafados, G.N. Leledakis, E.G. Pyrgiotakis, I. Androutsopoulos and Manos Fergadiotis, "Machine Learning in Bank Merger Prediction: A Text-based Approach". European Journal of Operational Research, 312(2): 783-797, 2023.
A.G. Katsafados, G.N. Leledakis, E.G. Pyrgiotakis, I. Androutsopoulos, I. Chalkidis and M.Fergadiotis, "Textual Information and IPO Underpricing: A Machine Learning Approach". Journal of Financial Data Science, 5(2):100-135, 2023.
A. Katsafados, I. Androutsopoulos, I. Chalkidis, E. Fergadiotis, G. Leledakis and E. Pyrgiotakis, "Using Textual Analysis to Identify Merger Participants: Evidence from the U.S. Banking Industry". Finance Research Letters, 42:101949, 2021.
O.S. Chlapanis, D. Galanis and Ion Androutsopoulos, "LAR-ECHR: A New Legal Argument Reasoning Task and Dataset for Cases of the European Court of Human Rights". Proceedings of the Workshop on Natural Legal Language Processing (NLLP 2024) of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024), Miami, Florida, USA, 2024.
O. Chlapanis, I. Androutsopoulos and D. Galanis, "Archimedes-AUEB at SemEval-2024 Task 5: LLM explains Civil Procedure". Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval 2024) of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024), Mexico City, Mexico, pp. 1607-1622, 2024.
L. Loukas, M. Fergadiotis, I. Chalkidis, E. Spyropoulou, P. Malakasiotis, I. Androutsopoulos and G. Paliouras, "FiNER: Financial Numeric Entity Recognition for XBRL Tagging". Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022), Dublin, Ireland, 2022.
I. Chalkidis, A. Jana, D. Hartung, M. Bommarito, I. Androutsopoulos, D.M. Katz and N. Aletras, "LexGLUE: A Benchmark Dataset for Legal Language Understanding in English". Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022), Dublin, Ireland, 2022.
D. Mamakas, P. Tsotsi, I. Androutsopoulos and I. Chalkidis, "Processing Long Legal Documents with Pre-trained Transformers: Modding LegalBERT and Longformer". Proceedings of the 4th Workshop on Natural Legal Language Processing (NLLP 2022) of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022), Abu Dhabi, United Arab Emirates, 2022.
S. Xenouleas, A. Tsoukara, G. Panagiotakis, I. Chalkidis and I. Androutsopoulos, "Realistic Zero-Shot Cross-Lingual Transfer in Legal Topic Classification". Proceedings of the 12th Hellenic Conference on Artificial Intelligence (SETN 2022), Corfu, Greece, 2022.
L. Loukas, M. Fergadiotis, I. Androutsopoulos and P. Malakasiotis, "EDGAR-CORPUS: Billions of Tokens Make The World Go Round". Proceedings of the Economics and Natural Language Processing Workshop (EcoNLP 2021) of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2021), on-line and Punta Cana, Dominican Republic, 2021.
I. Chalkidis, M. Fergadiotis and I. Androutsopoulos, "MultiEURLEX -- A Multi-lingual and Multi-label Legal Document Classification Dataset for Zero-shot Cross-lingual Transfer". Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2021), on-line and Punta Cana, Dominican Republic, 2021.
I. Chalkidis, M. Fergadiotis, D. Tsarapatsanis, N. Aletras, I. Androutsopoulos and P. Malakasiotis, "Paragraph-level Rationale Extraction through Regularization: A case study on European Court of Human Rights Case". Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2021), held on-line, 2021.
I. Chalkidis, M. Fergadiotis, P. Malakasiotis, N. Aletras and I. Androutsopoulos, "LEGAL-BERT: The Muppets straight out of Law School". Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), held on-line, 2020.
I. Chalkidis, M. Fergadiotis, P. Malakasiotis and I. Androutsopoulos, "Neural Contract Element Extraction Revisited". Proceedings of the Document Intelligence Workshop of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada, 2019.
I. Chalkidis, I. Androutsopoulos and N. Aletras, "Neural Legal Judgment Prediction in English". Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019), Florence, Italy, pp. 4317-4323, 2019.
I. Chalkidis, M. Fergadiotis, P. Malakasiotis and I. Androutsopoulos, "Large-Scale Multi-Label Text Classification on EU Legislation". Proceedings of the 57th Annual Conference of the Association for Computational Linguistics (ACL 2019), Florence, Italy, pp. 6314-6322, 2019.
I. Chalkidis, M. Fergadiotis, P. Malakasiotis, N. Aletras and I. Androutsopoulos, "Extreme Multi-Label Legal Text Classification: A Case Study in EU Legislation". Proceedings of the Workshop on Natural Legal Language Processing (NLLP 2019) of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2019), Minneapolis, USA, pp. 78-87, 2019.
I. Chalkidis, I. Androutsopoulos and A. Michos, "Obligation and Prohibition Extraction Using Hierarchical RNNs". Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018), Melbourne, Australia, pp. 254-259 (short papers), 2018.
I. Chalkidis and I. Androutsopoulos, "A Deep Learning Approach to Contract Element Extraction". Proceedings of the 30th International Conference on Legal Knowledge and Information Systems (JURIX 2017), Luxembourg, pp. 155-164, 2017.
I. Chalkidis, I. Androutsopoulos and A. Michos, "Extracting Contract Elements". Proceedings of the 16th International Conference on Artificial Intelligence and Law (ICAIL 2017), London, UK, pp. 19-28, 2017.
Boulieris, P., Pavlopoulos, J., Xenos, A., & Vassalos, V., 2023. Fraud detection with natural language processing. Machine Learning, 1-22. https://doi.org/10.1007/s10994-023-06354-5 [doi.org]
Passali, T., Gidiotis, A., Chatzikyriakidis, E., & Tsoumakas, G. (2021). Towards Human-Centered Summarization: A Case Study on Financial News. Bridging Human-Computer Interaction and Natural Language Processing, HCINLP 2021 - Proceedings of the 1st Workshop.
Avramelou, L., Passalis, N., Tsoumakas, G., & Tefas, A. (2023). Domain-Specific Large Language Model Finetuning using a Model Assistant for Financial Text Summarization. 2023 IEEE Symposium Series on Computational Intelligence (SSCI), 381–386. https://doi.org/10.1109/SSCI52147.2023.10371906 [doi.org]
Team
Sophia Ananiadou is Professor in Computer Science at The University of Manchester and lead researcher in Archimedes R.C. She is Director of the UK National Centre for Text Mining and holding prominent roles such as Deputy Director of the Institute of Data Science and AI (Manchester), Turing Fellow (2018-2023), ELLIS fellow, and Distinguished Research Fellow at the AI Research Centre (AIST Japan). Her research focuses on NLP, LLMs and leveraging AI to understand and utilise language knowledge, especially in specialised areas like finance and biomedicine. She has pioneered in NLP tasks such as automatic term recognition, information extraction (event extraction), semantic search, emotion detection, summarisation. Sophia has also held roles such as Senior Area Chair and Program Committee Member across major conferences such as ACL, NAACL, EMNLP, LREC, COLING, and IJCAI. She has organised numerous workshops and shared tasks in fintech and biomedicine in ACL, EMNLP, NAACL, IJCAI and Coling and she is one of the instigators of SIG-FinTech.
Pantelis John (PJ) Beaghton is Professor of Practice in Computing at Imperial College London, Security Science Fellow at the Institute for Security Science and Technology, and co-lead of the Imperial College Network of Excellence in Financial Technology (FinTech). His research combines AI/ML with financial industry expertise to address critical security challenges in global financial markets. He focuses on two main areas: analyzing contagion patterns of disruptive shocks across electronic securities and derivative markets, and developing forensic techniques for detecting suspicious transaction patterns linked to financial crime. Prior to academia, he held prominent industry positions including managing a global statistical arbitrage hedge fund and serving as Managing Director and senior quantitative trader at Salomon Brothers. Through his research and consulting work with both private and public sectors, he bridges the gap between academic innovation and practical applications in financial security.
Ion Androutsopoulos is Professor of Artificial Intelligence in the Department of Informatics, Athens University of Economics and Business (AUEB), and head of AUEB's Natural Language Processing (NLP) Group. He holds a Diploma (MEng) in Electrical Engineering from the National Technical University of Athens (1991), an MSc in Information Technology/Knowledge Based Systems from the University of Edinburgh (1992), and a PhD in Artificial Intelligence from the University of Edinburgh (1996). Before joining AUEB in 2002, he worked as a research scientist in what is now the Centre for Language Technology of Macquarie University in Sydney (1996-97), and the Institute of Informatics and Telecommunications of NCSR "Demokritos" (1998-2002). His current research interests include: biomedical question answering; natural language generation from medical images; text classification, including filtering toxic content; information extraction and opinion mining, including legal text analytics and sentiment analysis; deep learning for NLP; NLP in the digital humanities.
Grigorios Tsoumakas received a degree in Computer Science from the Aristotle University of Thessaloniki (AUTH), Greece, in 1999, an MSc in Artificial Intelligence from the University of Edinburgh, United Kingdom, in 2000 and a PhD in Computer Science from AUTH in 2005. He is a Professor of Machine Learning and Knowledge Discovery at the School of Informatics of AUTH since 2024, where he has also served as Associate Professor (2020-2024), Assistant Professor (2013 – 2020) and Lecturer (2007 – 2013). Since 2024, he also serves as an Affiliate Researcher at Archimedes/RC Athena, Greece. In addition, he is co-founder and chief scientific officer at Medoid AI, a spin-off company of AUTH established in 2019, developing custom AI solutions based on cutting-edge Machine Learning technology. Dr. Tsoumakas is a senior member of ACM and IEEE. His research expertise focuses on supervised learning (ensemble methods, multi-target prediction, interpretability) and natural language processing (semantic indexing, key phrase extraction, summarization). He has published more than 150 research papers and according to Google Scholar he has more than 19.000 citations and an h-index of 52. His honors include receiving the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD) 10-Year Test of Time Award in 2017 and the Marco Ramoni best paper award at the 19th International Conference on Artificial Intelligence in Medicine (AIME 2021).