Mining and Learning in the Legal Domain
The 3rd International Workshop on Mining and Learning in the Legal Domain (MLLD-2023)
In conjunction with the 32nd ACM International Conference on Information and Knowledge Management (CIKM-2023)
University of Birmingham and Eastside Rooms, UK
Sunday 22nd October 2023.
University of Birmingham and Eastside Rooms, UK
Sunday 22nd October 2023.
Important Dates
Paper submission deadline: August 18th, 2023 extended to September 1st, 2023 (AoE)
Paper acceptance notification: September 15th, 2023 extended to September 17th, 2023
Paper final version due: October 1st, 2023
Workshop date: October 22nd, 2023
Abstract
The increasing accessibility of legal corpora and databases create opportunities to develop data-driven techniques and advanced tools that can facilitate a variety of tasks in the legal domain, such as legal search and research, legal document review and summary, legal contract drafting, and legal outcome prediction. Compared with other application domains, the legal domain is characterized by the huge scale of natural language text data, the high complexity of specialist knowledge, and the critical importance of ethical considerations. The MLLD workshop aims to bring together researchers and practitioners to share the latest research findings and innovative approaches in employing data mining, machine learning, information retrieval, and knowledge management techniques to transform the legal sector. Building upon the previous successes, the third edition of the MLLD workshop will emphasize the exploration of new research opportunities brought about by recent rapid advances in Large Language Models and Generative AI. We encourage submissions that intersect computer science and law, from both academia and industry, embodying the interdisciplinary spirit of CIKM.
Topics
We encourage submissions on novel mining and learning based solutions in various aspects of legal data analysis such as legislations, litigations, court cases, contracts, patents, NDAs and bylaws. Topics of interest include, but are not limited to:
Applications of Large Language Models (LLMs) and Generative AI in the legal domain
Prompt engineering and automated prompting for legal NLP tasks
LLMs for legal contract drafting
Legal assistance using conversational AI
Risks and limitations of LLMs in the legal domain
Applications of data mining techniques in the legal domain
Classifying, clustering, and identifying anomalies in big corpora of legal records
Legal analytics
Citation analysis for case law
Applications of machine learning and NLP techniques for legal textual data
Information extraction, information retrieval, question answering and entity extraction/resolution for legal document reviews
Summarization of legal documents
eDiscovery in legal research
Case outcome prediction
Legal language modelling and legal document embedding and representation
Recommender systems for legal applications
Topic modeling in large amounts of legal documents
Training data for the legal domain
Acquisition, representation, indexing, storage, and management of legal data
Automatic annotation and learning with human in the loop
Data augmentation techniques for legal data
Semi-supervised and transfer learning, domain adaptation, distant supervision
Ethical issues in mining legal data
Privacy and GDPR in legal analytics
Bias and trust in the applications of data mining
Transparency in legal data mining
Emerging topics in the intersection of AI and law
Digital lawyers and legal machines
Smart contracts
Future of law practice in the era of Generative AI
Submission
All submissions must be in English, in PDF format, and in ACM two-column format (sigconf). The ACM LaTeX template are available from the ACM website and the Overleaf online editor.
To enable double-blind reviewing, authors are required to take all reasonable measures to conceal their identity. The anonymous option of the acmart class must be used. Furthermore, ACM copyright and permission information should be removed by using the nonacm option. Therefore, the first line of your main LaTeX document should be as follows.
\documentclass[sigconf,review,anonymous,nonacm]{acmart}
To facilitate the exchange of ideas, this year we adopt a policy similar to that of ICTIR'23 which allows submissions of any length between 2 and 9 pages plus unrestricted space for references. Authors are expected to submit a paper whose length reflects what is needed for the content of the work, i.e., page length should be commensurate with contribution size. Reviewers will assess whether the contribution is appropriate for the given length. Consequently, there is no longer a distinction between long and short papers, nor a need of condensing or enlarging medium-length ones. We will probably allocate more presentation time to longer papers during the workshop.
As in the previous editions of MLLD, each paper will be reviewed by at least 3 reviewers from the Program Committee.
We are going to produce non-archival proceedings for this workshop on arXiv.org, similar to IPA'20. Thus, authors can refine their accepted papers and submit them to formal conferences/journals after the workshop.
Submissions should be made electronically via EasyChair:
https://easychair.org/conferences/?conf=mlld2023
Attendance
The CIKM-2023 conference will be held in-person in Birmingham, UK. Therefore, it is expected that most (if not all) of the authors will present their accepted papers in-person for this workshop. Some invited speakers and/or participants may have the flexibility to attend online.
Registration
The registration for the workshop is done through the main conference CIKM-2023.
CIKM will be opening the registration in a few weeks. If you would like to express your interest in the project and be notified when the registration is open, please drop an email to Alina Petrova.
Programme Committee
Arian Askari, Leiden University, Netherlands
Pan Du, Thomson Reuters Labs, Canada
Shang Gao, Casetext, USA
Shoaib Jameel, University of Southampton, UK
Evangelos Kanoulas, University of Amsterdam, Netherlands
Dave Lewis, Redgrave Data, USA
Haiming Liu, University of Southampton, UK
Yiqun Liu, Tsinghua University, China
Miguel Martinez, Law Business Research, UK
Isabelle Moulinier, Thomson Reuters Labs, USA
Aileen Nielsen, Harvard University, USA
Joel Niklaus, Standford University, USA
Milda Norkute, Thomson Reuters Labs, Switzerland
Douglas Oard, University of Maryland, USA
Jaromir Savelka, Carnegie Mellon University, USA
Frank Schilder, Thomson Reuters Labs, USA
Shohreh Shaghaghian, Amazon, Canada
Dietrich Trautmann, Thomson Reuters Labs, Switzerland
Xiaoling Wang, East China Normal University, China
Gineke Wiggers, Wolters Kluwer, Netherlands
Josef Valvoda, University of Cambridge, UK
Jun Xu, Renmin University, China
Fattane Zarrinkalam, University of Guelph, Canada
Keynote Talk
Title:
Ensuring Reliability in Legal LLM Applications
Abstract:
The usage of large language models (LLMs) has exploded over the past year, especially since OpenAI introduced ChatGPT in November 2022; however, ensuring accuracy and reliability in LLM-generated outputs remains a challenge, especially in knowledge-intensive domains such as law. In this talk, we will present some methods that we use to ensure reliability in CoCounsel, Casetext's GPT-4 based legal AI assistant, touching upon topics including retrieval-augmented generation for legal research, methods for reducing hallucinations, managing cost vs. reliability tradeoff, evaluating LLMs in the legal context, and generating synthetic data from GPT4.
Bios:
Javed Qadrud-Din is the Director of Research & Development at Casetext, where he builds early systems to push the boundaries of legal technology, including Casetext's deep learning-based semantic search system and, more recently, Casetext's GPT4-based products. Prior to Casetext, Javed was a machine learning engineer at Meta and held engineering and product roles at IBM. He has been working in machine learning for the past decade, but, before that, he worked as a lawyer for startup companies at the law firm Fenwick & West. He holds a JD from Harvard Law School and a BA from Cornell University.
Martin Gajek is the head of Machine learning at Casetext. His team researches and develops low-latency contextual information retrieval systems and language generation systems, including LLMs. These technologies form the backbone of Casetext’s flagship product, CoCounsel. Martin holds a PhD in Applied Physics from Sorbonne University (UPMC) and held postdoctoral positions at UC Berkeley and IBM Research. Prior to joining Casetext, he was involved in R&D for semiconductor hardware, specifically optimizing memory architectures for deep learning accelerators.
Shang Gao is a senior machine learning researcher at Casetext, where he designs, develops, and deploys solutions for legal and transactional language understanding, generative question answering, and knowledge retrieval. His recent work includes the development of CoCounsel, Casetext's AI legal assistant based on OpenAI’s GPT-4, and demonstrating that GPT-4 can pass all portions of the Unified Bar Exam. Prior to Casetext, Shang was a research scientist at Oak Ridge National Laboratory, where he led a research team building clinical NLP solutions for the National Cancer Institute. Shang has a PhD in Data Science from the University of Tennessee.
Organizers
Masoud Makrehchi, Thomson Reuters Labs & OntarioTech University, Canada
Dell Zhang, Thomson Reuters Labs, UK
Alina Petrova, Thomson Reuters Labs, UK
John Armour, University of Oxford, UK
Contact
If you have any question regarding this workshop, please email mlld23@easychair.org.