Career Portfolio

Kate Njoki Mbugua

Nairobi, Kenya

Email: knjoki.mbugua@gmail.com

My Linkedin Profile

My GitHub

My Resume

Programming Languages

Python (Scripting), SQL (Querying)

Data Engineering

Docker, dbt, Airflow, Terraform

Building and deploying ML dashboards

Looker, Dash & Streamlit

Data Science & Machine Learning

Supervised and Unsupervised ML models, Time Series Analysis, A/B testing

Math & Stats

Probability, statistics,

linear algebra

Cloud Platforms

Google Cloud Platform, AWS

About me

I am a Data Engineer whose expertise includes automating data querying processes from diverse sources, developing scalable ETL data pipelines, creating data apps and visualization interfaces and building ML models. I am committed to automating workflows and simplifying complex data for insightful and interactive team use.
Having worked in diverse settings, from a biotechnology company evaluating the performance of multiple nitrogen treatments to a logistics technology startup managing inventory, I bring adaptability and an analytical mindset.
I have spearheaded an end-to-end analytics engineering project on developing a crop production data pipeline, showcasing a wide-ranging skill set.

Education

Strathmore University ( 2017-2021 )

Bachelor of Business Science (Actuarial Science)

(Second Class Honours - Upper Division)

Training & Certification

AWS Certified Cloud Practitioner May 2023
Data Structures & Algorithms (Udemy) March 2022
Machine Learning & Data Engineering specialization (10Academy) October 2021
Machine Learning course offered by Stanford University (Coursera) June 2021

Work Experience

Data Engineer Pivot Bio (November 2021 - November 2023)

Contributed to the migration of a dockerised pipeline to AWS EC2 instances for cloud-based processing, integrating AWS SNS for email notifications and CloudWatch Logs for debugging purposes.
Built Python APIs and visualisation packages to assist the data science team in automating the process of querying data from the company’s data warehouse and AWS S3 bucket, and other publicly available datasets.
Built data apps and visualisation interfaces using Python Dash to automate the process of generating insightful and interactive visuals for the team.
Evaluated how nitrogen treatments versus the control responded spectrally across the length of the growing season at 6 different growth stages and determined whether there was a statistically significant difference between the two groups.
Evaluated how effective Canopy Cover and Canopy Height metrics are at measuring nitrogen in plants using linear mixed effect models.

Inventory Control, Sendy Limited (May 2021 - October 2021)

Worked with the operations, sales and marketing teams by keeping detailed records of any changes in inventory and reflecting these changes in Sendy Supply’s inventory.

Managed in-app inventory, published reports of store counts and ensured product information remained accurate.
Utilized Tableau to create dynamic visualizations, analyzing product volume, identifying high and low-earning products, detecting loss leaders, and uncovering purchasing trends, including product associations.

Data Science intern, Intelipro (January 2020 - April 2020)

Carried out data cleaning and performed exploratory data analysis using Pandas.
Visualized data using matplotlib and Jupyter Notebook for presentation.
Executed projects using SARIMA and Facebook Prophet for time series forecasting.

Intern, TakenMind Organisation (August 2019)

Carried out data analysis and visualization using Pandas, NumPy and Seaborn
Internship project involved determining what employees are leaving and trying to predict the ones that will leave next.

Volunteer Experience

Global Shaper, Global Shapers Nairobi Hub (March 2023 - Present)

I volunteer with fellow hub members to plan and execute events like the Upscale Artists Seminar, Meet the Leader sessions, and provide support in the coordination of community projects.

Ahmadiyya Basic School |Cape Coast, Ghana: Tutor (Feb - March 2019)

Volunteered for 6 weeks and served by teaching mathematics and basic computer skills to 11 -14 year old students.

AIESEC in Strathmore University (March 2018 - October 2019)

Incoming Global Volunteer Opportunity Manager for the SHOFCO-Women project

Download CV here

Projects

Kenya Crop Production Data Pipeline

Developed a scalable data pipeline that extracts, tweaks, loads, and transforms focusing on Kenya's 2017 and 2018 Crop Production data by Counties from the Kilimo Data Portal using Terraform, Docker, Airflow, Google Cloud Platform (GCP), BigQuery and DBT.

Used Looker Studio to visualise the data.

Pharmaceuticals Store Sales Forecasting

The time series Sales data was predicted using Random Forest Regression and Decision Tree Regression models. The Random Forest model outperformed the Decision Tree model with a R-squared value of 0.90 and RMSE value of ~912.

Check out the deployed dashboard here that enables a user to explore features, run the models and view feature importance for each model.

SmartAd A/B testing

Used A/B testing to test if 2 ads that an advertising company ran, a dummy Ad and a Smart Ad, resulted in a significant lift in brand awareness. I first performed classical A/B p-value testing then used machine learning algorithms i.e. Logistic Regression, Decision Tree Classifier and XGBoost Classifier.

Check out the deployed dashboard here!

African Language Speech Recognition: Swahili Speech-to-text

Worked in a group of 9 for this project. We used the CNN + Bidirectional RNN architectures to model the audio files so as to successfully transcribe Swahili speech to text.

Created a python class that would be able to load the files from different folders, clean the transcriptions and store them in one DataFrame for easy manipulation. Trained the data on AWS instance.

Wholesale prices prediction of vegetables in Nairobi

Used Time Series models such as ARIMA and SARIMA to predict the prices of Cabbages and Kales in Nairobi, Kenya. Checked for stationarity and presence of seasonality in the data and modelled it accordingly for accurate results.

Strathmore University ( 2017-2021 )

Bachelor of Business Science (Actuarial Science)

Data Engineer Pivot Bio (November 2021 - November 2023)

Inventory Control, Sendy Limited (May 2021 - October 2021)

Data Science intern, Intelipro (January 2020 - April 2020)

Intern, TakenMind Organisation (August 2019)

Kenya Crop Production Data Pipeline

Pharmaceuticals Store Sales Forecasting

SmartAd A/B testing

African Language Speech Recognition: Swahili Speech-to-text

Wholesale prices prediction of vegetables in Nairobi

For more information on hiring from 10 Academy, contact team at 10academy.org