Datasets, Codes, and Language Models

InLegalBERT 

A BERT based legal specific language model called the InLegalBERT model from the paper Pre-training Transformers on Indian Legal Text. And it's tokenizer files . Available on huggingface at this link [click here].

D2V-BiGRU-CRF

This is a supervised catchphrase extraction technique as described in our paper titled - "A Sequence Labeling Model for Catchphrase Identification from Legal Case Documents". The python implementation was done by me and is available on GitHub at this link [click here].

PSLegal

An unsupervised catchphrase extraction system as described in our paper titled - "Automatic Catchphrase Identification from Legal Court Case Documents". The python implementation was done by me and is available on GitHub at this link [click here].

FIRE 2017 IRLeD track

The track had two tasks. The first task was a legal precedence retrieval task, and the second was a legal catchphrase extraction task. The track website is linked here [click here]. Please find the details of the two tasks and the respective datasets given on the website.