LegalEval: Understanding Legal Texts
SemEval-2023 Task 6
Motivation
In populous countries (e.g., India), pending legal cases have grown exponentially. Due to the nature of the legal domain, it may not be possible to automate the entire judicial pipeline completely; nevertheless, many intermediate tasks can be automated to augment legal practitioners and hence expedite the system. However, legal texts are different from commonly occurring texts typically used to train NLP models which make it difficult to apply existing NLP models and techniques directly, which calls for the development of legal domain-specific techniques. We are proposing 3 shared subtasks which will act as building blocks in developing legal AI applications
Tasks Overview
We are proposing three shared sub-tasks:
(A) Rhetorical Roles (RR): Structuring unstructured legal documents into semantically coherent units
(B) Legal Named Entity Recognition (L-NER): Identifying relevant entities in a legal document
(C) Court Judgement Prediction with Explanation (CJPE): Predicting the outcome of a case along with an explanation
Getting Started
To familiarize yourself with this Task, we suggest:
Register for the task by filling this form: https://forms.gle/uoeRrqjcYLnae5Ls6
Also follow us on Twitter @LegalEval for latest updates.
Read through this page to understand the sub-tasks and their settings.
Decide on the sub-task(s) and Setting(s) you intend to participate in. You can participate in any one (or more) of the sub-tasks.
Step through the Google Colab Notebooks with the Baselines for each sub-task and setting you wish to participate in so you understand the requirements.
Start working on your model(s) for the sub-task(s) setting(s)
Once the evaluation starts in Jan 2023, Submit your results
Important (Tentative) Dates
Training data ready 1 September 2022
Evaluation starts: 10 January 2023
Evaluation ends: 31 January 2023
System paper submission due February 2023
Task paper submission due February 2023
Notification to authors March 2023
Camera ready due April 2023
SemEval workshop Summer 2023 (co-located with a major NLP conference)
Sub-task A: Rhetorical Roles Prediction (RR)
Relevance of Rhetorical Roles Prediction:
The purpose of creating a rhetorical role corpus is to enable automated understanding of legal documents by segmenting them into topically coherent units. This segmentation is a fundamental building block for many legal AI applications like judgment summarizing, judgment outcome prediction, precedent search, etc.
Task Overview:
Given that legal documents are long and unstructured, we propose a task for automatically segmenting legal judgment documents into semantically coherent text segments, and each such segment is assigned a label such as a preamble, fact, ratio, arguments, etc. These are referred to as Rhetorical Roles (RR). Concretely, we propose the task of Rhetorical Role Prediction: the task is to segment a given legal document by predicting the rhetorical role label for each sentence. The detailed definitions of rhetorical roles along with datasets are outlined here. This task is a sequential sentence classification with single label multiple classes. Also check out the following papers for more details: paper1 and paper2. The baseline model can be found at this link.
Evaluations:
The rhetorical roles task will be evaluated using a weighted F1 score based on the hidden test data. To preserve the integrity of test results, we do not release the test data to the public. Instead, we require you to submit your model so that we can run it on the test data for you. We use Codalab for test data evaluation. Please refer to CodaLabs Page for more details. The evaluation metric used here is micro f1.
Sub-task B: Legal Named Entities Extraction (L-NER)
Task Relevance:
Similar to the importance of NER in the general NLP domain, Legal NER is equally important in the legal domain. For example, Legal NER is the first step in extracting relevant entities for the information extraction and retrieval based tasks.
Task Overview:
Legal documents have peculiar entities like names of petitioner, respondent, court, statute, provision, precedents, etc. These entity types are not recognized by the standard Named Entity Recognizer. Hence there is a need to develop a Legal NER system.
A list of legal named entities covered is given here. A court judgment can be split into two sections. The first one is the Preamble which contains the names of parties, court, lawyers etc. The judgment text starts after the Preamble. The datasets for the preamble and the judgement text are provided separately.
The Image below shows a typical judgement and some entities in that. The Baseline model & dataset can be found here and paper (Accepted at NLLP 2022 Workshop at EMNLP).
Please note that the post-processing mentioned in the git repo is not part of this task. For the evaluation purpose, the sentences which were annotated using only sentence-level context will be used as ground truth.
Note: Dev data has been released!
Evaluations:
Please refer to CodaLabs Page for more details.
For Legal NER, we would be using standard F1 score metrics (Segura-Bedmar et al., 2013).
Sub-task C: Court Judgement Prediction with Explanation (CJPE)
Task Relevance:
The volume of legal cases has been growing exponentially in many countries; if there is a system to augment judges in the process, it would be of immense help and expedite the process. Moreover, if the system provides an explanation for the decision, it would help a judge make an informed decision about the final outcome of the case.
Task Overview:
We propose the task of Legal Judgement Prediction with Explanation, where, given a legal judgment document, the task involves automatically predicting the outcome ((binary: accepted or denied) of the case and also providing an explanation for the prediction. The explanations are in the form of relevant sentences in the document that contribute to the decision. Check out this paper and github repo for more details about the baseline models. The task will be made available on CodaLab platform soon.
We plan to divide the task into two sub-tasks: (1) Court Judgment Prediction (2) Explanations for the prediction.
We have taken all the steps to remove any bias in the dataset and address the ethical concerns associated with the task.
Evaluations:
Please refer to CodaLabs Page for more details.
The evaluation for judgment prediction (binary classification) will be done using the standard F1 score metric, and for the explanation sub-task, we will be using BLEU, METEOR, and ROUGE scores (Malik et al., 2021b) for evaluating the machine explanations with respect to the gold annotations.
The sample data for CJPE can be found here.
Contact and Task Organizers
For any queries please contact us at: legalaieval@gmail.com
Organizers:
Smita Gupta
Abhinav Joshi
Sai Kiran Tanikella