Kate Njoki Mbugua
Nairobi, Kenya
Kate Njoki Mbugua
Nairobi, Kenya
Email: knjoki.mbugua@gmail.com
Programming Languages
Python (Scripting), SQL (Querying)
Data Engineering
dbt, Rivery, Terraform, Docker, Airflow
Building and deploying dashboards
Looker, Metabase, Dash & Streamlit
Data Science & Machine Learning
Supervised and Unsupervised ML models, Time Series Analysis, A/B testing
Math & Stats
Probability, Statistics, Linear Algebra
Cloud Platforms
Snowflake, Google Cloud, AWS
About me
With over 4 years of experience, I have developed expertise in developing scalable ETL data pipelines, managing infrastructure and creating data apps and visualisation interfaces. I am committed to automating workflows and simplifying complex data for insightful and interactive team use.
Having worked in diverse settings, from a biotechnology company evaluating the performance of multiple nitrogen treatments to a logistics technology startup managing inventory, I bring adaptability and an analytical mindset.
I have spearheaded end-to-end analytics engineering projects, most recently being a crop production data pipeline, showcasing a wide-ranging skill set.
Work Experience
Built automated ETL pipelines integrating Shopify and Snowflake using Rivery, improving data availability for supply chain analytics.
Integrated Zoho Inventory data with Snowflake via REST API in Rivery, enabling real-time inventory tracking.
Refactored Terraform codebase by modularising databases, roles and managed accounts, enhancing scalability.
Created external Snowflake shares and managed accounts in Terraform to securely provide data to third parties.
Developed dbt models to extract, clean and transform malaria data from unstructured exports records, ensuring data reliability.
Managed Snowflake role-based access control to enhance security and enforce principle of least privilege.
Designed and developed Metabase and Looker dashboards for internal teams and external stakeholders, enhancing data-driven decision-making and reporting.
Contributed to the migration of a dockerised pipeline to AWS EC2 instances for cloud-based processing, integrating AWS SNS for email notifications and CloudWatch Logs for debugging purposes.
Built Python APIs and visualisation packages to assist the data science team in automating the process of querying data from the company’s data warehouse and AWS S3 bucket, and other publicly available datasets.
Built data apps and visualisation interfaces using Python Dash to automate the process of generating insightful and interactive visuals for the team.
Evaluated how nitrogen treatments versus the control responded spectrally across the length of the growing season at 6 different growth stages and determined whether there was a statistically significant difference between the two groups.
Evaluated how effective Canopy Cover and Canopy Height metrics are at measuring nitrogen in plants using linear mixed effect models.
Worked with the operations, sales and marketing teams by keeping detailed records of any changes in inventory and reflecting these changes in Sendy Supply’s inventory.
Managed in-app inventory, published reports of store counts and ensured product information remained accurate.
Utilized Tableau to create dynamic visualizations, analyzing product volume, identifying high and low-earning products, detecting loss leaders, and uncovering purchasing trends, including product associations.
Education
Training & Certification
AWS Certified Cloud Practitioner May 2023
Data Structures & Algorithms (Udemy) March 2022
Machine Learning & Data Engineering specialization (10Academy) October 2021
Machine Learning course offered by Stanford University (Coursera) June 2021
Volunteer Experience
Global Shaper, Global Shapers Nairobi Hub (March 2023 - Present)
I volunteer with fellow hub members to plan and execute events like the Upscale Artists Seminar, Meet the Leader sessions, and provide support in the coordination of community projects.
Ahmadiyya Basic School |Cape Coast, Ghana: Tutor (Feb - March 2019)
Volunteered for 6 weeks and served by teaching mathematics and basic computer skills to 11 -14 year old students.
AIESEC in Strathmore University (March 2018 - October 2019)
Incoming Global Volunteer Opportunity Manager for the SHOFCO-Women project
Projects
Developed a scalable data pipeline that extracts, tweaks, loads, and transforms focusing on Kenya's 2017 and 2018 Crop Production data by Counties from the Kilimo Data Portal using Terraform, Docker, Airflow, Google Cloud Platform (GCP), BigQuery and DBT.
Used Looker Studio to visualise the data.
The time series Sales data was predicted using Random Forest Regression and Decision Tree Regression models. The Random Forest model outperformed the Decision Tree model with a R-squared value of 0.90 and RMSE value of ~912.
Check out the deployed dashboard here that enables a user to explore features, run the models and view feature importance for each model.
Used A/B testing to test if 2 ads that an advertising company ran, a dummy Ad and a Smart Ad, resulted in a significant lift in brand awareness. I first performed classical A/B p-value testing then used machine learning algorithms i.e. Logistic Regression, Decision Tree Classifier and XGBoost Classifier.
Check out the deployed dashboard here!
Worked in a group of 9 for this project. We used the CNN + Bidirectional RNN architectures to model the audio files so as to successfully transcribe Swahili speech to text.
Created a python class that would be able to load the files from different folders, clean the transcriptions and store them in one DataFrame for easy manipulation. Trained the data on AWS instance.
Used Time Series models such as ARIMA and SARIMA to predict the prices of Cabbages and Kales in Nairobi, Kenya. Checked for stationarity and presence of seasonality in the data and modelled it accordingly for accurate results.