Achieved IEEE Senior Member (highest professional level) in 2023 for continued research contribution.
Awarded Huawei President’s Team Award 2022 (Consumer Cloud Services).
Awarded Huawei President's Gold Medal 2020 (Individual Contributor).
Selected as finalist for Innovation of the Year, Ireland Data Science Award 2018.
Invited Young Researcher at Heidelberg Laureate Forum 2018.
Doctoral Dissertation selected for Special mention in IEEE Intelligent Informatics Bulletin.
Google European Doctoral Fellowship in Natural Language Processing, 2014-2017.
Science Accomplishment Award at IBM Research India (2013).
Academic Excellence Award, Indian Institute of Technology (IIT) India, 2009.
All-India Rank 1 (99.98 percentile) in Graduate Aptitude Test in Engineering (GATE), 2008.
Offered GATE scholarship from Ministry of HRD, India for post graduation studies.
University Gold Medal, Jadavpur University India, 2008.
Ranked among the top 0.75% students in West Bengal Joint Entrance Examination (WB-JEE), 2004.
Selected as a NIIT-IT Talent Resource Pool in 2004.
All-India Rank 97 in National Level Science Talent Search Examination (NSTSE), 2003.
@Eaton R&D:
Pattern Mining for AI solutions in "Future Manufacturing Units":
Designed algorithmic framework coupling image processing and machine learning algorithms to diagnose and predict errors in additive manufacturing units. It enabled early detection and preemption of structural faults with an impressive 95% F1-score. We also looked at data mining approaches to detect patterns and propose "remedial action recommendations" for organizational productivity like supply-chain. The work led to a publication at IEEE Big Data 2019 and several patents.
@Bell Labs:
Automated Knowledge Hierarchy Maintenance:
Developed novel algorithms to update existing knowledge graphs from unstructured texts with new emerging concepts using machine learning approaches. Using embedding based feature creation, the framework provided state-of-the-art results in accurately populating the hierarchical structure. Additionally, a new mathematical measure was also proposed for comparing structural and logical integrity of directed acyclic graphs. The findings were published in SIGIR 2018 and IRJ 2019.
Text Analysis for Fine-Grained Understanding:
We investigated how fine-grained natural language understanding would enable better and more informative topic labeling of documents to improve downstream retrieval of related and diverse information. Further, we proposed a novel measure based on entity-sentiment coupling to catgorize human opinion utterance to detect different viewpoints expressed in blogs and social communication channels. The findings were published in IEEE Big Data 2018 and ICWSM 2020.
Machine Learning for Computationally Hard Problems:
Explored the possible use of machine learning algorithms for practical solutions on theoretically hard computation problems (NP-hard). We observed that graph structural features enable a learning model to efficiently detect maximum sized cliques in large graphs based on pruning strategy. We observed an accuracy of 90% with 300x speed-ups in detecting largest communities (maximum cliques) in Web-scale social network graphs. The approaches were published in AAAI 2019, IJCAI 2019, and CIKM 2019.
@Google:
Bridging Nominal Anaphora Resolution:
Proposed an algorithm to extract implicit semantic relationship between two nominals present in an input text. Using corpus-based statistics and scores based on feature vectors, the best connecting nouns exhibiting an antecedent-anaphora relation was predicted. An initial working framework was developed and C++ exhibiting promising results with ~65% accuracy. Current work involves complete framework development and test-bed results.
@IBM:
Stream De-Duplication:
Novel algorithms for efficient identification of duplicate elements in data streams with low memory footprint and real-time characteristics was proposed and published in EDBT 2012 and VLDB 2013. Theoretical and practical analysis reported near-optimal (~0.1%) error rates on large data streams with significant run-time speed-up.
Spatio-Temporal Analytics:
The aim of this was to develop a unified spatio-temporal database querying framework by integrating trajectory mining and predictive query reporting with IBM Informix Spatial DB. Enhanced indexing structures and frequent pattern mining methods with optimizations have been employed to make the system efficient and robust. The project was developed in C++ and Java.
Wireless Network Analytics:
Designed and implemented an intelligent caching protocol coupled with frequent user behavior mining for IBM's Video-on-Demand services, to improve user experience (viewing latency) and reduce the network bandwidth requirement. We obtained an improvement of ~35% on the number of cache hits as compared to the simple LRU policy and reported a ~20% improvement in user experience based on Quality-of-Experience (QoE) measure. The project is completely implemented in C and C++. This work was published in MDM 2015 and SPBDA Workshop (ICDCN) 2015.
Load Balancing:
Proposed a new algorithm for efficient load balancing (balls-into-bins problem) reducing the known theoretical complexity from O(log log n) to O(1) for the sequential scenario, thus providing an optimal placement strategy. Similar results were proven for the parallel, multi-dimensional, and weighted case resulting in IBM Technical reports.
Cloud Computing Services:
Developed and implemented efficient VM placement and migration algorithms for enhanced energy management and reducing migration costs in data centers from an infrastructure-as-a-service (IaaS) perspective. Various components of the architecture such as TSAM, TPM, NIM, etc., were integrated using Java based workflows. Designed a part of the User Interface using JSP. This cloud offering led to an operational cost saving of ~25% for IBM's client-side data centers. Fragments of the work were published in CLOUD 2012, NOMS 2012 and CNSM 2011. It also led to the filing of a patent.
@Doctoral:
Developed efficient and scalable algorithms for Entity-Mention Co-Reference Resolution and Linking across documents in large-scale input corpora for Knowledge Harvesting. Advancements have been published in TACL 2015 and EMNLP 2015.
Proposed a machine learning based language model to detect credibility of online information pertaining to entities and relationships. The work was presented at ECML-PKDD 2016.
Designed an encoding technique for efficient storage of Knowledge Graphs as RDF triples for improved query performance, which was published at IJCAI 2016.
My doctoral research lies at the intersection of Text Mining and Natural Language Processing.
@Masters:
Proposed a new compressed indexing scheme for string databases which supported the entire gamut of queries from exact search to prefix, suffix and substring search. It inherently supported parallel access making it extremely efficient and faster than the competing structures. This work was subsequently published at COMAD 2010.
Developed a Scheme Interpreter using Haskell.
Developed an interpreter for the Scheme language using the functional programming language,Haskell, providing an unique hands-on experience in functional language paradigm.
Implemented and analyzed the various data partition algorithms in database join query using a distributed architecture (in Java).
Implemented the hash-partitioning, range-partitioning, and round-robin partitioning and compared the performance of the techniques under different datasets and queries, providing a proposal of usage scenarios for the different techniques.
Surveyed and presented Clustering algorithms in Data Streaming environment.
Performed a literature survey of the existing techniques of clustering in data streams and provided insights into the working and complexity of the methods.
Parallelization of 188.ammp SPEC CPU 2000 benchmark to obtain a speedup of 1.30 (advanced compiler optimizations).
Performed the dependency analysis of the benchmark, and hot-spot profiling of the code using Intel Vtune. Final optimizations of the benchmark were done using GNU GCC levels and parallelized the benchmark using OpenMP constructs.
Presented Google File System (GFS) as part of term presentation.
Provided detailed insights as to the working of the Google File System (GFS), its various data structures, indexing schemes and the overall system architecture.
Implemented in C the parallel version of the Bidirectional LU decomposition in SMP architecture under GF (prime).
Implemented Bidirectional LU decomposition and parallelized the code using OpenMP constructs. Used GMP libraries for handling large number under GF.
@Bachelors:
“MobileAuth”
– Completed as Bachelor's project under the guidance of Dr. S. Chattopadhyay and Mr. S. Sinha (Interra Systems, Kolkata) and provided a set of tools to enable GNU/Linux users to login to their system using mobile devices as the authenticating medium. It was developed using C++ (for the libraries) and J2ME (for mobile interfaces) on Linux platform. The project was registered with www.sourceforge.net.