Artificial Intelligence for Legal Assistance

Dataset and Evaluation

NOTICE: The training data for both tasks have been released and can be downloaded from the links provided. Details for decryption of the data will be emailed to registered participants.

Details for decryption have been shared with registered participants via email. Please let us know if you face any issues.

Data for Task 1:

The training data consists of 60 case documents. In each document the sentences are labelled by one of the 7 categories mentioned here.

The test data will consist of additional documents which will be used for evaluation.

Training data: The training data for Task 1 can be found at: https://drive.google.com/file/d/1SPvqlR5DCSkZZfY7_XYxR-pVK9nzBE4n/view?usp=sharing

Test Data: TBA

Data for Task 2:

The training data for task 2 will consist of 500 document-summary pairs. The documents are judgements delivered by the Supreme court of India. Each judgement is accompanied by a summary written by a legal expert. We will provide pre-processed and sentence tokenized versions for both judgements and summaries. For each sentence in the judgement text we will provide a noisy label (~75% accurate), which indicates whether or not the sentence is "summary-worthy". Each judgement and summary sentence will additionally be labelled with one of the seven rhetorical roles mentioned in task 1. The "summary-worthy" label as well as the rhetorical roles are assigned automatically and are noisy. Participants should take this into account while training their models.

The test data will consist of additional judgements which will be used for evaluation and will contain manually labelled data.

Training data: The training data for Task 2 can be found at: https://drive.google.com/file/d/1UQ9BzrbqqFkilBgCTdK0N6nsC6W63GyJ/view?usp=sharing

Test Data: TBA

Evaluation plan:

Task 1

Standard classification metrics Precision, Recall and F1-Score will be used for evaluation

Task 2

Task 2a - Standard classification metrics Precision, Recall and F1-Score will be used for evaluation
Task 2b - ROUGE scores will be used for evaluation