Kate Njoki Mbugua

Nairobi, Kenya

Programming Languages

Python (Scripting), SQL (Querying)

Data Engineering

Docker, dbt,  Airflow, Terraform

Building and deploying ML dashboards

Looker, Dash & Streamlit

Data Science & Machine Learning

Supervised and Unsupervised ML models, Time Series Analysis, A/B testing

Math & Stats

Probability, statistics,

linear algebra

Cloud Platforms

Google Cloud Platform, AWS

 About me 

 Education 

    Bachelor of Business Science (Actuarial Science)

(Second Class Honours - Upper Division)

Training & Certification

Work Experience 

Volunteer Experience

I volunteer with fellow hub members to plan and execute events like the Upscale Artists Seminar, Meet the Leader sessions, and provide support in the coordination of community projects.

Volunteered for 6 weeks and served by teaching mathematics and basic computer skills to 11 -14 year old students.

Incoming Global Volunteer Opportunity Manager for the SHOFCO-Women project 

Projects

Developed a scalable data pipeline that extracts, tweaks, loads, and transforms focusing on Kenya's 2017 and 2018 Crop Production data by Counties from the Kilimo Data Portal using Terraform, Docker, Airflow, Google Cloud Platform (GCP), BigQuery and DBT. 

Used Looker Studio to visualise the data.

The time series Sales data was predicted using Random Forest Regression and Decision Tree Regression models. The Random Forest model outperformed the Decision Tree model with a R-squared value of 0.90 and RMSE value of ~912.

Check out the deployed dashboard here that enables a user to explore features, run the models and view feature importance for each model.

Used A/B testing to test if 2 ads that an advertising company ran, a dummy Ad and a Smart Ad, resulted in a significant lift in brand awareness. I first performed classical A/B p-value testing  then used machine learning algorithms i.e. Logistic Regression, Decision Tree Classifier and XGBoost Classifier.

Check out the deployed dashboard here!

Worked in a group of 9 for this project. We used the CNN + Bidirectional RNN architectures to model the audio files so as to successfully transcribe Swahili speech to text. 

Created a python class that would  be able to load the files from different folders, clean the transcriptions and store them in one DataFrame for easy manipulation. Trained the data on AWS instance.

Used Time Series models such as ARIMA and SARIMA to predict the prices of Cabbages and Kales in Nairobi, Kenya. Checked for stationarity and presence of  seasonality in the data and modelled it accordingly for accurate results.