Home
Presently, pursuing MS in Data Science, University at Buffalo, NY, USA
I pursued graduate in Computer Science and Engineering from Jadavpur University, Kolkata.
Previously, I received a B. Tech in Computer Science and Engineering from VNR, VJIT, Hyderabad.
My current research includes an implementation of a software to detect and prevent malicious network activities in real-time.
Current Research
YouTube Streaming Analytics using Kafka & PySpark
Created a Hadoop cluster with 5 Nodes to manage the distributed data
Set up a Kafka PubSub model for streaming the YouTube video metadata from Search API and Data API
Processed raw data from KafKa using Spark Structured Streaming and written complex JSON structure in PySpark
Developed a Python modules to consume the YouTube APIs and trigger the jobs automatically for every 15 mins.
Flatten the complex JSON data and inserted into databases like Cassandra and Hive to perform the trends
Developed Dashboard of trending YouTube videos over a month-long data
Environment: PySpark, YouTube API, Apache KafKa, Apache Cassandra, Hadoop, Hive
Unified Framework for welfare Schemes for Rural India (App URL)
Design and develop the database scheme for welfare schemes in PostgreSQL.
Citizen details are Random dataset is generated and normalized the tables
Written a Complex SQL queries and functions in PostgreSQL.
Developed an interactive Web-app using ejs scripting and Semantic-UI
Mitigate the challenges of database and UI design
Developed an API to create admin credentials.
Citizens are uniquely identified using random ID (alphanumeric)
New Citizens are created dynamically, and IDs are generated uniquely on the fly.
Developed an interactive Dashboard at a high-level abstract using PostgreSQL functions
Created a trigger on DML activity update/insert/delete
It is currently deployed on fly.io.
Database is hosted on AWS RDS service.
Environment: ejs, expressjs, Sequelize, PostgreSQL,
Predicting Mortality Rate based on Comprehensive Features of Intensive Care Unit Patients
Classify the mortality rate of ICU stay patients.
The dataset obtained from MIT, after successful completion of Collaborative Institutional Training Initiative (CITI) and obtain the access of MIMIC-IV maintained by research group Physionet, MIR research Group.
Written a complex SQL query written to obtain the aggregated feature on Google BigQuery.
Developed prediction algorithm using XGBoost and Random Forest model to predict ICU patients’ mortality rate.
Environment: Google Big Query SQL, Python, Flask
Streaming Tweets and Sentiment Analysis using NLP
Extracted Tweets from Twitter stream using KafKa Producer
Preprocessed the tweets using KafKa Consumer
Create a streaming application that reads from a Kafka topic and writes to a kafka topic.
Processed tweet data from the Kafka topic and performed PySpark Structured Streaming then inserted into Cassandra NoSQL
Implemented sentiment analysis using Valence Aware Dictionary for Sentiment Reasoning as Positive, Negative, and Neutral tweets
Implemented dynamic Dashboard to Visualization the sentiment of a Tweet
Extend this system to multiple producers and consumers.
Environment: PySpark, Apache KafKa, Apache Cassandra, NLTK