Resources

Pre-trained Models for Indian Law

These models were pre-trained over a large corpus of Indian legal documents, as part of our work "Pre-trained Language Models for the Legal Domain: A Case Study on Indian Law" (ICAIL '23)  These models are based on the BERTForPreTraining model from Huggingface.

InLegalBERT is obtained by pre-training the existing LegalBERT (Chalkidis et al., 2020) while InCaseLawBERT is obtained by pre-training the existing CaseLawBERT (Zheng et al., 2021) for 300K steps. CustomInLawBERT is trained from scratch for 700K steps using a customized vocabulary.

Identifying Charges and Statutes in Indian Law

We have developed datasets and models for the task of identifying the relevant/violated charges or statutes given the facts of a legal situation, as part of our prior works "LeSICiN: A Heterogeneous Graph-Based Approach for Automatic Legal Statute Identification from Indian Legal Documents" (AAAI '22) and "Automatic Charge Identification from Facts: A Few Sentence-Level Charge Annotations is All You Need" (COLING '20). 

For identifying criminal statutes in India, we release the Indian Legal Statute Identification Dataset (ILSI), which contains 66K samples and 100 target Sections of the Indian Penal Code.

Semantic Segmentation of Court Documents

We have developed datasets and experimented with different baseline models for the task of segmenting court case documents into rhetorical roles such as facts, issues, arguments, ratio, ruling, etc. as part of our prior works "DeepRhole: deep learning for rhetorical role labeling of sentences in legal case documents" (Artif. Intell Law, '21) and "Identification of Rhetorical Roles of Sentences in Indian Legal Judgments" (JURIX '19)