DATA ENGINEER
DAISY OKACHA
Daisy Khaabi Okacha
Nairobi,Kenya
“Big data is at the foundation of all of the megatrends that are happening today, from social to mobile to the cloud to gaming.” — Chris Lynch
TOOLS & TECHNOLOGIES
Programming Skill
Python, SQL, C++,
HTML5, CSS ,
Learning Javascript
Math & Statistics
Linear Algebra
Statistics and Probability
Operating Systems
Windows
Unix/Linux
Technologies
Docker,
Heroku
AWS
Software Engineering
Tensorflow
GitHub
Git
Fast API
Python Flask
CI/CD
Github Actions
DVC
MLflow
CML
ETL Tools
Kafka,
Spark
Airflow
Dbt
Web & Visualization Tools
Streamlit,
Power bi
Tableau
Redash
ABOUT ME
Daisy is a Junior Data Engineer with a computer Science background. She has knowledge on OOP programming, SQL, Python, relational and non-relational databases. Experience in ETL, data visualization, preprocessing, data and feature engineering, developing an end-to-end data pipeline. She has prior experience working with both structured and unstructured data.
When Daisy is not working, she enjoys plants and nature, travelling and reading books.
EDUCATION
10 Academy (Data, ML & Web 3 Engineering Training)
May 2022 - October 2022
Worked on 12 real world projects,4 of which were Data Engineering projects .
I have used tools such as Kafka,Superset,Redash,Airflow , in order to build scalable data pipelines.
BSc. In Computer Science
September 2017 - September 2021
Relevant Courses taken:
Machine learning
Software Engineering
Knowledge discovery and Data Mining
Linear Algebra
Probability and Statistics
Website development and applications
WORK EXPERIENCE
EQUITY BANK August 2021 - October 2021
Maintained and tested predictive models using Machine learning algorithms ,Python ,SQL ,Power BI and Oracle .
Implemented a capstone project using the k-means clustering algorithm to segment the bank's customer base.
SAVANNAH ANALYTICS ( Kenyan eHealth startup) June 2021 - August 2021
Cleaned and fetched the relevant data from the BIgQuery data warehouse.
Redesigned the call center dashboard to reflect the relevant call center KPIs using Tableau
ACTIVITIES AND TRAININGS
Data Engineering track on Datacamp September 2022 - Present
KAMILIMU Nairobi,Kenya
A mentorship program for computer science students Jan 2020 - Jun 2021
I was equipped with public speaking, professional communication, financial literacy ,and ICT skills in Data Science and research.
I received one to one mentorship from Jacklyne Betty, a professional in the Data Science and Machine learning field as well as in Software Engineering.
I implemented and presented an individual project on User research on the Competency Based Curriculum and a group project on the lack of student housing problem in Kenya.
I participated in Scholarship, Mock Job and Innovation competitions.
Microsoft Student Learn Ambassadors September 2019 - September 2021
Ran a community of over 200 student members and hosted 26 physical events where I invited speakers to teach students various concepts on cloud computing with Azure to complement classroom knowledge.
The outcome was that some of the students were able to deploy and run their school and personal projects in the cloud.
I continue to enable tech communities through organizing events such as the IndabaX hackathon,volunteering and through mentorship.
Projects
I used the Extract Load Transform (ELT) framework using DBT to set up a postgreSQL data warehouse.Airflow orchestrated the pipelines and redash visualized the data.
Later,i migrated the Data Warehouse to a mySQl Data Warehouse and used superset for visualization.
I worked with pNEUMA data .The PNeuma project captured all traffic during the morning rush hour in the business district of Athens, for 5 days.
Data collection from an API provided by USGS_3DEP ( United States Geological Survey 3D Elevation Program).
I created a python module that domain experts and data scientists can use to fetch, visualise, and transform publicly available satellite and LIDAR data from the API.
Output may include a graphical display of the returned elevation files as either a 3D render plot or as a heatmap.
Analyzed effect of new store openings and promotions on consumer behavior.
Using Sklearn pipeline,prediction of daily sales in various stores up to 6 weeks ahead of time.
Build Streamlit dashboard to communicate results with employees
Merged machine learning with causal inference using the Causal Nex Library.
I worked with the Wisconsin breast cancer data to infer the key factors that may influence breast cancer.