Ethani Caphace

Ethani Caphace,

Junior Machine Learning Engineer

Bachelor of Eng. in Electronics and Telecommunication

Dar Es Salaam Institute of Technology (2016-2020)

Email: ethancaphace@gmail.com

Dodoma, Tanzania


Python, C/C++, & Go

Object-Oriented and Functional Programming

MySQL, SQLite, & PostgreSQL

SQL programming

TensorFlow, PyTorch, Scikit-learn, Kafka, Spark, DBT, and Airflow

Machine Learning Libraries, Tools, and Algorithms


Mathematics, & Statistics

Probability, Algebra, Calculus, and Statistics,


Django, Flask, Superset, and Streamlit Frameworks

Building ML dashboards, API's and Web applications

Telecommunications

Networking, RF Transmission, Telephone systems, and Mobile Communication

About me

  • An ambitious, team-player Junior Machine Learning Engineer, eager to learn and contribute developed knowledge in innovating and applying Machine Learning knowledge and skills in solving day to day problems, with hands-on practice in applying software development life-cycle.

  • Deep knowledge in applying Machine Learning Algorithms and Libraries, worked in different ML and Data Engineering projects majored in Natural Language Processing, Data Mining (topic modeling and sentiment analysis), A/B Hypothesis testing, Sales predictions, Developing python packages, and Developing an end-to-end data collection pipeline using Kafka, spark and Airflow, and Flask web framework. Software developments life-cycle.


Education


  • 10 Academy (July 2021-present)

Data and Machine Learning Engineering

- Intensive hands-on training, and experience in solving real-world/industrial problems using

Data Engineering, and ML Engineering solutions/approaches which involve,

i. Setting up project's codebase, version control (git, DVC, and MLflow),

ii. Performing Data Exploration Analysis, Feature Extraction, and Pre-processing, etc.

iii. Building data pipelines for ETL using Kafka, Spark, and scheduling tasks using Airflow.

iv. Developing, Testing, and Maintaining ML Models using different Algorithms,

v. Perform CI/CD using Travis CI, and Compare models using MLflow,

vi. Dockerization, and Dashboard presentation and Visualization using (streamlit, Redash),

and deployment on different platforms, include Heroku, Streamlit, AWS, etc.


  • Dar Es Salaam Institute of Technology (2016-2020)

Bachelor of Engineering in Electronics and Telecommunication

- Graduated with a 4.2 GPA out of 5.

- Embedded, and IoT Systems, Networking, RF Broadcasting, and Transmission,

Telephone systems, and Cellular and Mobile Communications.

Work Experience

(October 2020- to date)

  • Full-stack Web application development using Django framework.

  • Designed and Developed IoT Platform, using DeviceHive open-source,

  • Researching and developing an ML solution for Interactive Voice Recognition, and Optical character recognition (OCR) projects using Deep Learning Neural Networks,

  • Designing and Implementation of Embedded and IoT systems.

Some of My Projects

Pharmaceutical Sales prediction across multiple stores. Github Link

A regression problem, whose main aim is to come up with an end to end product that delivers Sales prediction across multiple stores of some Pharmaceutical company. The performance of 3 regression models are explored: Linear Regression, XGBoost, and Random Forest. Random Forest regressor emerges the best performer with a Mean Square Error of 0.056. Streamlit is used for model deployment, and visualization

Africa Covid-19 Twitter Data Analysis - Topic Modelling & Sentiment Analysis Github Link, Link2

This project aimed to discover what topics are discussed on Twitter concerning the Covid-19 pandemic in Africa, and to figure out how people feel toward these topics or the Covid-19 pandemic in general.

Telecom-User-Data-Analysis. Github Link

Using Telecommunication's data to perform Data Exploratory, Customers Overview, User Engagement, Experience and Satisfaction Analysis.

.

Python_USGS_LIDAR Github Link

Python module interfaced with USGS 3DEP, that will be used by Data Scientists to fetch, visualize and transform publicly available satellite and LIDAR data.

.