LegalEval: Understanding Legal Texts


SemEval-2023 Task 6

Motivation

In populous countries (e.g., India), pending legal cases have grown exponentially. Due to the nature of the legal domain, it may not be possible to automate the entire judicial pipeline completely; nevertheless, many intermediate tasks can be automated to augment legal practitioners and hence expedite the system. However, legal texts are different from commonly occurring texts typically used to train NLP models which make it difficult to apply existing NLP models and techniques directly, which calls for the development of legal domain-specific techniques. We are proposing 3 shared subtasks which will act as building blocks in developing legal AI applications

Tasks Overview

We are proposing three shared sub-tasks: 

Getting Started

To familiarize yourself with this Task, we suggest:

Important (Tentative) Dates

Sub-task A: Rhetorical Roles Prediction (RR)

Relevance of Rhetorical Roles Prediction:

The purpose of creating a rhetorical role corpus is to enable automated understanding of legal documents by segmenting them into topically coherent units. This segmentation is a fundamental building block for many legal AI applications like judgment summarizing, judgment outcome prediction, precedent search, etc.

Task Overview:

Given that legal documents are long and unstructured, we propose a task for automatically segmenting legal judgment documents into semantically coherent text segments, and each such segment is assigned a label such as a preamble, fact, ratio, arguments, etc. These are referred to as Rhetorical Roles (RR). Concretely, we propose the task of Rhetorical Role Prediction: the task is to segment a given legal document by predicting the rhetorical role label for each sentence. The detailed definitions of rhetorical roles along with datasets are outlined here. This task is a sequential sentence classification with single label multiple classes. Also check out the following papers for more details: paper1 and paper2. The baseline model can be found at this link.  

Evaluations:

The rhetorical roles task will be evaluated using a weighted F1 score based on the hidden test data. To preserve the integrity of test results, we do not release the test data to the public. Instead, we require you to submit your model so that we can run it on the test data for you. We use Codalab for test data evaluation. Please refer to CodaLabs Page for more details. The evaluation metric used here is micro f1.

Sub-task B: Legal Named Entities Extraction (L-NER)

Task Relevance:

Similar to the importance of NER in the general NLP domain, Legal NER is equally important in the legal domain. For example, Legal NER is the first step in extracting relevant entities for the information extraction and retrieval based tasks. 


Task Overview:
Legal documents have peculiar entities like names of petitioner, respondent, court, statute, provision, precedents,  etc. These entity types are not recognized by the standard Named Entity Recognizer. Hence there is a need to develop a Legal NER system.
A list of legal named entities covered is given here. A court judgment can be split into two sections. The first one is the Preamble which contains the names of parties, court, lawyers etc. The judgment text starts after the Preamble. The datasets for the preamble and the judgement text are provided separately. 

The Image below shows a typical judgement and some entities in that. The Baseline model & dataset can be found here and paper (Accepted at NLLP 2022 Workshop at EMNLP)

Please note that the post-processing mentioned in the git repo is not part of this task. For the evaluation purpose, the sentences which were annotated using only sentence-level context will be used as ground truth.

Note: Dev data has been released!

Evaluations:

Please refer to CodaLabs Page for more details. 

For Legal NER, we would be using standard F1 score metrics (Segura-Bedmar et al., 2013). 



Sub-task C: Court Judgement Prediction with Explanation (CJPE)

Task Relevance:


The volume of legal cases has been growing exponentially in many countries; if there is a system to augment judges in the process, it would be of immense help and expedite the process. Moreover, if the system provides an explanation for the decision, it would help a judge make an informed decision about the final outcome of the case.


Task Overview:

We propose the task of Legal Judgement Prediction with Explanation, where, given a legal judgment document, the task involves automatically predicting the outcome ((binary: accepted or denied) of the case and also providing an explanation for the prediction. The explanations are in the form of relevant sentences in the document that contribute to the decision. Check out this paper and github repo for more details about the baseline models. The task will be made available on CodaLab platform soon. 

We plan to divide the task into two sub-tasks: (1) Court Judgment Prediction (2) Explanations for the prediction.
We have taken all the steps to remove any bias in the dataset and address the ethical concerns associated with the task.

Evaluations:

Please refer to CodaLabs Page for more details. 


The evaluation for judgment prediction (binary classification) will be done using the standard F1 score metric, and for the explanation sub-task, we will be using BLEU, METEOR, and ROUGE scores (Malik et al., 2021b) for evaluating the machine explanations with respect to the gold annotations.


The sample data for CJPE can be found here



Contact and Task Organizers


For any queries please contact us at: legalaieval@gmail.com 

Organizers: