2020 - Now: Senior Ph.D Research Scholar, Deptt of CSE, IIT Kharagpur !PMRF Scholar!
[Working under the supervision of Dr. Pawan Goyal and Dr. Saptarshi Ghosh]
2018 - 2020: M.Tech in CSE, IIT Kharagpur
[Graduated with 9.90 CGPA, 1st Rank] [Qualified through GATE 2018, GATE Score 862]
2014 - 2018: B.Tech in CSE, IIEST Shibpur
[Graduated with 9.21 CGPA] [Qualified through WBJEE 2014, Rank 414]
2014: Higher Secondary Examinations, Don Bosco School, Bandel
[Passed ISC 2014 with 97.4%]
2012: Secondary Examinations, Don Bosco School, Bandel
[Passed ICSE 2012 with 96%]
2021: Institute Silver Medal
[For securing Departmental 1st Rank in M.Tech CSE] [IIT Kharagpur]
2018 - 2020: Best Departmental Thesis
[For M.Tech Project 2018] [Deptt of CSE, IIT Kharagpur]
2014 - 2018: Best Paper Award
[For the paper "Identification of Rhetorical Roles of Sentences in Indian Legal Documents", Bhattacharya et al., 2019] [JURIX 2019]
2014: Mamraj Agarwal Rashtriya Puraskar
[For excellence in Higher Secondary Examinations] [Mamraj Agarwal Foundation]
2012: Gyan Jyoti Samman
[For excellence in Secondary Examinations] [Sikar Nagarik Zila Parishad]
2021 - Current: Legal Statute Identification
The task of Legal Statute Identification (LSI) involves automatically identifying the applicable laws/statutes given the description of a legal situation. We developed ILSI, a large-scale LSI dataset for Indian Criminal laws. We developed a Graph-based approach to solve the task. Consequently, we studied the efficacy of some popular LSI approaches (including ours) in terms of qualitative metrics, like performance over confusing, or rare statutes. We are currently pursuing efforts to perform the same task in a multi-lingual setting (Hindi and English documents). We are also performing a separate study by using layman descriptions as the input instead of court judgments.
2022 - Ongoing: Pre-training and Instruction Tuning for Indian Law
We created and released the first BERT-based pre-trained model for Indian law, InLegalBERT. Pre-trained over a corpus of 5+ million judgment documents from the Supreme Court and several High Courts of India, this model outperforms other BERT-based models on several legal downstream tasks, using both Indian as well as European documents. Currently, we are engaged in creating an instruction tuning dataset for Indian law, which comprises of several tasks, as well as fine-tuning instruction-based LLMs for the Indian legal domain.
2024: Benchmark for Indian Law
We created IL-TUR, the first large-scaled benchmark for Indian law. Our benchmark consists of several tasks such as Legal Statute Identification, Prior Case Retrieval, Court Judgment Prediction and Explanation, Bail Prediction, Semantic Segmentation of Court Judgments, Summarization, Translation, and Named Entity Recognition. We benchmarked generic architectures like BERT, task-specific SOTA architectures as well as the latest LLMs across each task. We have also created a public leaderboard for these tasks that can be used by the research community.
2019 - 2020: Automatic Identification of Crimes in Indian Legal Documents
[Developing a DL model to identify the possible crimes given the evidences of a legal case in India, comparing with standard baselines and state-of-the-art methods, improving performance significantly across models by also identifying crimes at the sentence level on a small, manually annotated dataset] [M.Tech Project]
2017 - 2018: Musical Instrument Identification from Audio
[Extracting features from audio such as pitch and MFCC, employing shallow models and CNNs for featurizing scalar data, and MFCC graphs respectively, to identify the most predominant musical instrument at every time frame] [B.Tech Project]
2016 - 2017: Analyzing Movie Recommendation Networks
[Crawling data from movie recommendation platforms such as IMDb, Google Play Movies, performing rudimentary analysis using data like genre distribution, popularity ratings; and also more efficient analysis techniques such as graph-based approaches like random walk to discover relationships between movie attributes (common cast, director, genre(s), recommendations)] [Winter and Summer (extended) Project]
2020: Extractive Summarization on Legal Documents via Reinforcement Learning Â
[Using REINFORCE algorithm to incorporate ROUGE score in the loss function of standard neural models for extractive summarization [Reinforcement Learning Course Project]
2020: Measuring Semantic Similarity between Legal Judgment Documents
[Extracting features from audio such as pitch and MFCC, employing shallow models and CNNs for featurizing scalar data, and MFCC graphs respectively, to identify the most predominant musical instrument at every time frame] [Deep Learning Course Project]
2019: Implementing Dynamic Pairwise Attention Model (Wang et al., SIGIR '18) on Indian Supreme Court Judgment Documents
[Matching the legal text for evidences with the written Acts/Sections in Indian law by implementing a state-of-the-art model developed on Chinese legal data] [Natural Language Processing Course Project]
2019: Image tagging in Wikipedia Articles
[Using advanced techniques in multi-modal retrieval for choosing the most relevant image for a Wikipedia article (full or partial) from few candidate images] [Information Retrieval Course Project]
2019: Malware Identification using IccTA
[Using tools such as Soot and IccTa for analyzing and classifying malware applications based on the function call graph] [Complex Networks Course Project]
2019: Improving Diversity in Recommendation Systems using Interaction Metrics
[Developing a new interaction metric between users and items and a recommendation policy that improves this metric gloabally] [Social Computing Course Project]