We developed the first benchmark for Indian law, in our work "IL-TUR: Benchmark for Indian Legal Text Understanding and Reasoning" (ACL '24). This benchmark consists of 8 tasks, some of which are open-domain NLP tasks such as Named Entity Recognition (NER), Machine Translation and Summarization; as well as some legal tasks such as Legal Statute Identification (LSI), Prior Case Retrieval (PCR), Court Judgment Prediction and Explanation (CJPE), Bail Prediction and Rhetorical Role Labeling (RRL).
Our benchmark is multi-lingual and covers case documents across different Courts and case types in India. We have created a framework for both using the benchmark as well as evaluation via a public leaderboard on HuggingFace.
These models were pre-trained over a large corpus of Indian legal documents, as part of our work "Pre-trained Language Models for the Legal Domain: A Case Study on Indian Law" (ICAIL '23) These models are based on the BERTForPreTraining model from Huggingface.
InLegalBERT is obtained by pre-training the existing LegalBERT (Chalkidis et al., 2020) while InCaseLawBERT is obtained by pre-training the existing CaseLawBERT (Zheng et al., 2021) for 300K steps. CustomInLawBERT is trained from scratch for 700K steps using a customized vocabulary.
We have developed datasets and models for the task of identifying the relevant/violated charges or statutes given the facts of a legal situation, as part of our prior works "LeSICiN: A Heterogeneous Graph-Based Approach for Automatic Legal Statute Identification from Indian Legal Documents" (AAAI '22) and "Automatic Charge Identification from Facts: A Few Sentence-Level Charge Annotations is All You Need" (COLING '20).
For identifying criminal statutes in India, we release the Indian Legal Statute Identification Dataset (ILSI), which contains 66K samples and 100 target Sections of the Indian Penal Code.
We have developed datasets and experimented with different baseline models for the task of segmenting court case documents into rhetorical roles such as facts, issues, arguments, ratio, ruling, etc. as part of our prior works "DeepRhole: deep learning for rhetorical role labeling of sentences in legal case documents" (Artif. Intell Law, '21) and "Identification of Rhetorical Roles of Sentences in Indian Legal Judgments" (JURIX '19)