August 2023 - May 2025
University of Illinois at Urbana Champaign
Master of Science - Information Management (Data Science and Analytics)
GPA : 3.93 / 4.0
June 2018 - June 2022
Amrita University
Bachelor of Technology - Electronics and Communication Engineering
GPA : 3.35 / 4.0
May 2024 - Present
Data Engineering Intern
Engineered 10+ complex ETL pipelines using AWS Glue, S3, and RDS, ensuring seamless data integration and conforming, resulting in standardized tractor sales data across multiple regions in Europe.
Utilized PySpark to improve data processing efficiency by breaking down Alteryx workflows into multiple jobs, implementing comprehensive unit tests to ensure functionality and uphold data quality across all stages.
Improved infrastructure provisioning by adopting IaC approach with CloudFormation and YAML, enhancing scalability, automation, and reducing deployment time by 30% while minimizing manual setup errors.
Developed and orchestrated workflows using Glue Workflow for monthly data processing and reporting, utilizing SNS for automated notifications and CloudWatch for monitoring, reducing manual intervention.
Conducted data discovery and created data lineage maps to facilitate workflow migration. Collaborated with stakeholders to document data flows and business requirements, ensuring a seamless transition through UAT.
Sept 2022 - July 2023
Data Engineer
Enhanced production data warehouse reliability by leveraging Snowflake, DBT, Dagster and Grafana to monitor and troubleshoot issues, overseeing 200+ metric dependencies and guaranteeing continuous data flow.
Improved performance by optimizing SQL queries and tuning indexes, resulting in 20% reduction in query execution. Reduced tableau scorecard delay by 40% by collaborating with downstream and upstream teams.
Spearheaded a proof of concept (POC) for detecting fraudulent insurance claims using Machine Learning, showcasing innovation for future production integration. Demonstrated potential for improving data quality.
Mar 2022 - July 2022
Data Engineer Intern
Designed and implemented a scalable, fault-tolerant data pipeline for datasets with over 4M+ records, leveraging Lambda and Glue for ETL transformations, Redshift as the Data Warehouse, and S3 as the data source.
Executed complex data processing and analysis tasks in Databricks on datasets exceeding 100GB, leveraging distributed computing and Delta Lake optimizations to enhance performance and efficiency.
Jan 2022 - Feb 2022
Data Science and Analytics Intern
Developed a real-time yoga assistant using Azure, TensorFlow, and MediaPipe to accurately detect and classify over 30+ yoga poses, integrating a voice assistant feature for real-time guidance and pose correction.
Captured user progress through 40+ metrics and managed user data on PostgreSQL. Used Tableau to create a progress tracker and presented the POC during the monthly incubator meet.
Sept 2020 - Jan 2021
Data Science Intern
Spearheaded the life cycle of three categories of a warehouse project by scraping over 10K+ images per category using Selenium, annotated images using labelImg, and trained various CNN models to achieve the best results.
Ensembled and deployed the best performing models to production, enhancing operational efficiency of warehouse
May 2022 - June 2022
Web Intern
Developed high-quality websites for small businesses and IT firms, meeting and exceeding client requirements while consistently meeting project deadlines.
Maintained proactive client engagement throughout the project lifecycle, ensuring client satisfaction and successful project delivery within specified timelines.
Development of a Decentralized Web App (Dapp) for covid-19 using AI, IEEE-Xplore click here
Insights from Microscopic Images click here
Also, Check Out!