Kate Njoki Mbugua
Nairobi, Kenya
Email: knjoki.mbugua@gmail.com
Programming Languages
Python (Scripting), SQL (Querying)
Data Engineering
Docker, dbt, Airflow, Terraform
Building and deploying ML dashboards
Looker, Dash & Streamlit
Data Science & Machine Learning
Supervised and Unsupervised ML models, Time Series Analysis, A/B testing
Math & Stats
Probability, statistics,
linear algebra
Cloud Platforms
Google Cloud Platform, AWS
About me
I am a Data Engineer whose expertise includes automating data querying processes from diverse sources, developing scalable ETL data pipelines, creating data apps and visualization interfaces and building ML models. I am committed to automating workflows and simplifying complex data for insightful and interactive team use.
Having worked in diverse settings, from a biotechnology company evaluating the performance of multiple nitrogen treatments to a logistics technology startup managing inventory, I bring adaptability and an analytical mindset.
I have spearheaded an end-to-end analytics engineering project on developing a crop production data pipeline, showcasing a wide-ranging skill set.
Education
- Strathmore University ( 2017-2021 )
Bachelor of Business Science (Actuarial Science)
(Second Class Honours - Upper Division)
Training & Certification
AWS Certified Cloud Practitioner May 2023
Data Structures & Algorithms (Udemy) March 2022
Machine Learning & Data Engineering specialization (10Academy) October 2021
Machine Learning course offered by Stanford University (Coursera) June 2021
Work Experience
Contributed to the migration of a dockerised pipeline to AWS EC2 instances for cloud-based processing, integrating AWS SNS for email notifications and CloudWatch Logs for debugging purposes.
Built Python APIs and visualisation packages to assist the data science team in automating the process of querying data from the company’s data warehouse and AWS S3 bucket, and other publicly available datasets.
Built data apps and visualisation interfaces using Python Dash to automate the process of generating insightful and interactive visuals for the team.
Evaluated how nitrogen treatments versus the control responded spectrally across the length of the growing season at 6 different growth stages and determined whether there was a statistically significant difference between the two groups.
Evaluated how effective Canopy Cover and Canopy Height metrics are at measuring nitrogen in plants using linear mixed effect models.
Worked with the operations, sales and marketing teams by keeping detailed records of any changes in inventory and reflecting these changes in Sendy Supply’s inventory.
Managed in-app inventory, published reports of store counts and ensured product information remained accurate.
Utilized Tableau to create dynamic visualizations, analyzing product volume, identifying high and low-earning products, detecting loss leaders, and uncovering purchasing trends, including product associations.
Volunteer Experience
Global Shaper, Global Shapers Nairobi Hub (March 2023 - Present)
I volunteer with fellow hub members to plan and execute events like the Upscale Artists Seminar, Meet the Leader sessions, and provide support in the coordination of community projects.
Ahmadiyya Basic School |Cape Coast, Ghana: Tutor (Feb - March 2019)
Volunteered for 6 weeks and served by teaching mathematics and basic computer skills to 11 -14 year old students.
AIESEC in Strathmore University (March 2018 - October 2019)
Incoming Global Volunteer Opportunity Manager for the SHOFCO-Women project
Projects
Developed a scalable data pipeline that extracts, tweaks, loads, and transforms focusing on Kenya's 2017 and 2018 Crop Production data by Counties from the Kilimo Data Portal using Terraform, Docker, Airflow, Google Cloud Platform (GCP), BigQuery and DBT.
Used Looker Studio to visualise the data.
The time series Sales data was predicted using Random Forest Regression and Decision Tree Regression models. The Random Forest model outperformed the Decision Tree model with a R-squared value of 0.90 and RMSE value of ~912.
Check out the deployed dashboard here that enables a user to explore features, run the models and view feature importance for each model.
Used A/B testing to test if 2 ads that an advertising company ran, a dummy Ad and a Smart Ad, resulted in a significant lift in brand awareness. I first performed classical A/B p-value testing then used machine learning algorithms i.e. Logistic Regression, Decision Tree Classifier and XGBoost Classifier.
Check out the deployed dashboard here!
Worked in a group of 9 for this project. We used the CNN + Bidirectional RNN architectures to model the audio files so as to successfully transcribe Swahili speech to text.
Created a python class that would be able to load the files from different folders, clean the transcriptions and store them in one DataFrame for easy manipulation. Trained the data on AWS instance.
Used Time Series models such as ARIMA and SARIMA to predict the prices of Cabbages and Kales in Nairobi, Kenya. Checked for stationarity and presence of seasonality in the data and modelled it accordingly for accurate results.