kARTIK THAKUR
SENIOR Data Engineer at LENDINGCLUB BANK
I am a passionate Data Engineer contributing to teams to deliver results based on data-driven insights rather than assumptions.
Experienced in collaborating with cross-functional teams to gather requirements and deliver innovative data solutions.
Results-oriented with 7+ years of experience in designing, implementing, and optimizing robust data pipelines.
Skilled in using a variety of programming languages, database technologies, and cloud platforms to address complex data challenges.
SKILLS
ACADEMIC BACKGROUND
Relevant Coursework
Machine Learning
Artificial Intelligence
Data Mining
Software Engineering
Database Management
Advanced Programming Concepts
Algorithms and Data Structures
Theory of Computation
GPU Programming
Relevant Coursework
Data structures
Operating Systems
Computer Organization
Programming using C
Computer Organization
Computer Networks
Career Overview
SENIOR DATA ENGINEER, REMOTE USA
DEC 2021 - PRESENT- Designing and building data pipelines that integrate directly with LendingClub’s cross-functional teams, opening the door to new products and features.
- Successfully migrated on-premises data infrastructure to the cloud, resulting in ~40% cost savings and improved scalability.
- Spearheaded the development of a scalable data pipeline, handling 3 terabytes of data daily, reducing processing time by ~20%
- Identified and implemented the internal process improvements- automating manual processes, optimizing data delivery, reducing Cloud cost, and re-designing infrastructure for greater scalability that reduced costs by ~30%.
- Mentored offshore resources, fostering a culture of knowledge sharing and continuous improvement.
DATA PLATFORM ENGINEER, REMOTE USA
JUNE 2020 - DEC 2021- Collaborated with senior engineers in translating raw, technical data into actionable insights that influence major objectives.
- Assisted in the Engineering of end-to-end data pipelines using Spark and Airflow, improving data processing efficiency by 40%.
- Supported data lake on the Hadoop ecosystem for data analytics and capturing business insights from historical data.
- Exposed and delivered aggregated/customer-centric data through reports, visualization products, and RESTful APIs.
- Collaborated with the analytics team to design and implement data warehouses, enabling faster and more accurate reporting.
DATA ML Engineer, DARPA-JPL, Albany, New York
DEC 2018 - MARCH 2020Spearheaded the implementation of building features and machine learning classifiers from data and predicted writing, interaction, and composition habits of users by applying data analysis, social engineering, and Natural Language Processing techniques.
Developed and implemented data cleaning processes, resulting in a 10% improvement in data accuracy and completeness.
Coordinated with research teams at multiple locations to understand research requirements, design and architect real-time distributed and reliable data pipelines thereby improving performance metrics by ~10-12%
Improved Predictive modeling by ~20% compared to traditional figures by analyzing real-time email responses through REST APIs.
Achieved ~15% better returns vs a historic performance by building and deploying Docker containers and Kubernetes clusters to break up monolithic apps into microservices, improving developer workflow, increasing scalability, and optimizing speed.
DATA ENGINEER, ACCENTURE/MICROSOFT, India
JUNE 2015 - JUNE 2018Worked with Microsoft (Client) in Enterprise commerce for large and critical web-based and database applications.
Managed scalable, cloud-hosted solutions to store data in support of our underlying systems and client needs using AWS stack (EC2, S3, Athena, Postgres)
Provided bug resolution for high-priority data infrastructure issues and minimized errors by up to ~95%
Automated regression and end-to-end data pipelines and data migration test cases with selenium and coded UI, eliminating 500 hours/year of manual work
Projects
GDELT Data Analysis: Data Pipelines, Big Data, EC2, S3, Amazon EMR, Spark
Designed an end to end functioning streaming architecture to extract, transform and load 2.15 TB of GDELT dataset using Big Data tools and created a time series analysis to predict growth rate, protests, peace index etc., Translated raw, technical data into actionable insights by performing data analysis. Visualized the data on an AWS EC2 instance using matplotlib and Seaborn.
Obstacle detection for self-driven AI cars: (Q-Learning, Python, ANN, PyTorch)
Created an autonomous Artificial Intelligence agent using Reinforcement learning which detects both static and dynamic obstacles while driving and keeps improving itself based on the rewards and penalties from the environment.
Food Images Recognition (Computer Vision, CNN, TensorFlow, Python, Food101 dataset)
Collaborated with a developer to create a food image recognition system which classifies 101 categories of food and predicts the nutritional value of the classified food image using Convolutional Neural Network and TensorFlow.
Email Structural Analysis (Textual Analysis, RNN, LSTM, Python, Quagga dataset)
Built an RNN model with word embedding features to detect the email zones like greetings, body and signature. The performance was further improved by adding set of pre-defined rules which considers the stylometric elements of words
Twitter Sentiment Analysis: (PoS Tagging, Bag of words, Twitter textual data, SVM)
Collaborated with a classmate to perform textual analysis on twitter data and experimented with several NLP features like partof-speech tagging and bag of words to create a Sentiment analysis component using Support Vector Machines.
Categorial Spam Detection (TF-IDF, TensorFlow, Python, NLTK, Neural Networks)
Applied text mining and NLP techniques to build TF-IDF features from APWG and Yelp dataset to classify text into various categories like Spam, Phishing, Malware, Propaganda, and Reconnaissance.
Live Scores Update Application (Sports Feed API, Java, JavaScript, Facebook API, Bootstrap)
Implemented a software which uses SPORTS FEED API and fetches live basketball updates, Daily Schedule, daily Scorecard, team rankings, wins, and losses. Hosted the application on AWS EC2 and designed a test harness to evaluate availability.
Online Events/Movies Booking Application (Java, Spring boot, Thyme leaf, CSS, HTML)
Developed a web application using 3 tier distributed architecture with user module, admin module, listings and bookings of events, concerts, payment and billing modules, view past and current bookings, ratings, and reviews.
Certifications
Microsoft Certified Professional (Microsoft Stack Expert): by Microsoft
Python, Java Programming and Problem Solving: by Hackerank
Tableau Data Scientists, Analyst, Author and Consumer: by Tableau
Developing ASP.NET MVC Web Applications, Programming in C# : by Microsoft
Dashboards, Tracking code, Campaign Tracking for Google Analytics: by Google
Advanced SQL, Docker and Data Visualization for Data Scientists: by LinkedIn CPE
COMMUNITY INVOLVEMENT
Volunteer
Nov 2013 - Nov 2017Volunteered to help children gain basic knowledge on elementary subjects like Mathematics and Science.
Volunteer
Jan 2014 - Jan 2018Occasionally organized various fun activities like painting and drawing to help children develop their strength and passion.