Desmond Onam
Lead Data Scientist | Expert Machine Learning Engineer and Data Engineer| GenAI | Passionate Mentor.
CX Engineer - ML Data Engineering
Email: desmondonam@gmail.com
Desmond Onam
Lead Data Scientist | Expert Machine Learning Engineer and Data Engineer| GenAI | Passionate Mentor.
CX Engineer - ML Data Engineering
Email: desmondonam@gmail.com
About me.
I am Desmond Onam, a passionate Generative AI Engineer and Machine Learning Data Expert, dedicated to transforming complex data challenges into actionable insights and innovative solutions. With a Bachelor’s degree in Mathematics and Computer Science from Jomo Kenyatta University of Agriculture and Technology and a diverse portfolio of certifications and projects, I excel in predictive modelling, data mining, and machine learning algorithm development.
My expertise includes handling structured, semi-structured, and unstructured data using Python, R, Spark, and SQL. As an award-winning tutor in Machine Learning, Data Engineering and Web3, I bring a unique combination of technical acumen and mentorship skills to every project.
Whether developing robust data pipelines, designing cutting-edge AI solutions, or delivering impactful training programs, I am driven by the power of data to solve real-world problems and enable business success. Let's create, innovate, and elevate together.
Work Experience
Drove successful architecture of AI-driven Disaster Management Systems, implementation of scalable data pipelines using cloud native technologies and serverless computing platforms, and optimization of data models for real-time analytics. Designed data governance frameworks and integrated edge computing solutions with centralized data platforms.
Key Contributions:
▪ Developed a machine learning-based disaster management system that aided in real-time analysis and visualization of data.
▪ Architected and deployed scalable synthetic data pipeline to retrieve data from Sentinel-2 and other APIs, used to create AI disaster management system and improve existing model (Leona) for specialized AI agents.
▪ Built an insight-driven dashboard for visualization with Plotly tools and optimized data models for real-time analytics and predictive insights of supply chain data.
▪ Collaborated in a team of 8 via GitHub to create a fully function and deployed system in production that is helping detect real-time Disasters like floods, fires and Earthquakes through Data Fusion, Reinforcement Learning, and Multimodal LLM.
Tools Used: Python | AWS| MySQL | PostgresSQL | LLM | Apache Spark | AirFlow| MLFlow | OpenAI GPT | DialogueFlow | Github | Jira |Slack
Business Requirements Translation: Collaborated with stakeholders to translate high-level business objectives into actionable data science problems, aligning analytics solutions with customer experience goals.
Data Preparation and Transformation: Extracted, cleaned, and transformed data to address integrated customer experience challenges, optimizing datasets for analysis and reporting.
Data Architecture Development: Partnered with engineering teams to build, test, and maintain scalable data architectures, streamlining data extraction, transformation, and loading (ETL) processes.
Pipeline Optimization: Implemented strategies to improve the reliability, efficiency, and quality of data pipelines, ensuring data consistency across organizational systems.
Machine Learning Model Deployment: Deployed advanced machine learning models across the Ajua Product Stack, delivering actionable insights that enhanced decision-making for clients in diverse sectors.
Generative AI Chatbot Development: Designed and deployed a customer self-service chatbot, leveraging generative AI to enhance interactivity and engagement with the platform, streamlining customer support operations.
Tools Used: Python | DBT | MySQL | PostgresSQL | AWS | Apache Spark | Prefect | MLFlow | OpenAI GPT | DialogueFlow | Github | Jira |Slack
Delivered industry-relevant training to 300+ students, focusing on data science and machine learning concepts, with a strong emphasis on employability in leading organizations.
Mentored students on 30+ real-world data science projects, offering hands-on guidance to tackle complex business problems using practical solutions.
Designed and implemented a Masters in Data Science curriculum, integrating emerging technologies and best practices, adopted internationally by College De Paris.
Conducted interactive training sessions covering Courses such as:
Complete Data Science with Python & R
Machine Learning and Deep Learning with Python
Human Resource Analytics with Python & R
Marketing Analytics with R
Data Visualization with Tableau, PowerBI, Looker Studio
Boosted student engagement by 35% through innovative teaching methodologies, fostering a collaborative and immersive learning environment. Highest retention in the data science department.
Provided tailored feedback to students, achieving a 92% course completion rate and a 90% job placement success rate within six months post-graduation.
Partnered with industry leaders to align course materials with current trends, ensuring students gained in-demand skills and practical knowledge.
Organized industry-recognized certification programs with prominent tech companies, enhancing students' professional credibility and career opportunities.
Developed an interactive online learning platform featuring:
Video lectures
Hands-on coding exercises
Engaging interactive modules
Supporting remote learning and providing seamless access to course content.
Directed a cross-functional team of 100+ data scientists and interns, deploying advanced deep learning models using technologies such as TensorFlow, Python, Pandas, SQL, Docker, MLFlow, AWS, pytest, and dbt, resulting in a 22% improvement in model accuracy.
Led end-to-end delivery of data science projects, from scoping and requirement analysis to design, execution, and deployment, achieving a $250,000 annual cost reduction through optimized resource utilization.
Designed and implemented advanced machine learning algorithms to streamline workflows and enhance task automation, boosting team efficiency by 30%.
Conducted hands-on training for interns on deep learning and machine learning techniques, enabling early anomaly detection during project development and improving team productivity by 35%.
Delivered actionable insights through comprehensive data analysis, including EDA and statistical modelling, driving strategic business decisions and operational improvements.
Enhanced stakeholder confidence by presenting detailed reports on project outcomes to senior management, fostering a culture of trust and collaboration.
Optimized data pipelines, reducing lead times by 40% and expediting AI solution deployment.
Applied cutting-edge machine learning techniques to achieve an 18% increase in predictive model accuracy, contributing to business success.
Mentored and upskilled 10+ junior data scientists, advancing their professional growth and strengthening team expertise.
Skills: ETL Tools · Teamwork · SQL · Git · Apache Spark · MySQL · DBT · Docker · Extract, Transform, Load (ETL) · Apache Kafka · Apache Airflow · Pipelines · Data Engineering · Python (Programming Language) · Data Analysis Click up.
● Equipped students with solid data ethics knowledge through comprehensive training programs on ethical issues surrounding machine learning and data science.
● Enhanced automated data processing machine performance by analyzing, testing, and debugging several lines of code.
● Received positive accolades from the community for developing open-source data science projects that solved real-world problems.
● Delivered a data visualization course that helped students develop skills in creating compelling and informative visualizations to communicate data insights effectively.
Skills: ETL Tools · Teamwork · SQL · Git · Apache Spark · MySQL · DBT · Docker · Extract, Transform, Load (ETL) · Apache Kafka · Apache Airflow · Pipelines · Data Engineering · Python (Programming Language) · Data Analysis
● Trained 37 junior data engineers in building datasets that underpin machine learning models and designing a real-time data pipeline that fastened semi-structured data processing.
● Instituted a project-based learning approach among machine learning and data engineering trainees by enabling them to ingest data from multiple third-party APIs on real-world projects. Supervised, and graded students' projects and provided individual detailed feedback to students.
● Built top talented, tech-savvy data engineers through comprehensive and data-driven technological training that equipped them with machine learning, data engineering, and web3 skills for job readiness.
● Liaised with data engineers to expand and optimize data and pipeline architecture; taught students how to make well-reasoned business decisions fueled by new data.
● Solved complex real-world problems by applying machine learning libraries and algorithms, using statistical modelling techniques, and writing programming codes.
● Designed machine learning systems and self-running artificial intelligence solutions through 100% working codes and testing models.
● Improved data quality and insight reports using data tagger and data wrangler; developed machine learning pipelines and trained models with end-to-end Bayesian segmentation.
● Contributed to solving community challenges by developing, simulating, testing, and improving various machine-learning algorithms.
Skills: ETL Tools · Teamwork · SQL · Git · Apache Spark · MySQL · DBT · Docker · Extract, Transform, Load (ETL) · Apache Kafka · Apache Airflow · Pipelines · Data Engineering · Python (Programming Language) · Data Analysis
· Engineered a data pipeline that ingested data from multiple sources using Google Analytics APIs across billions of rows of data.
· Automated ETL processes make it easier to wrangle data and reduce the time to upload and manual workload by half.
· Designed, developed and maintained scalable, insightful data tables, which acted as the primary input for analysis models, reports, and dashboards.
· Created superior functionality across data systems by automating testing of pipelines and scheduling tasks to validate the organization's assumptions about data and write logic to prevent issues from working downstream.
This is accomplished by:
Creating ETL pipelines with Python and associated technologies.
Using APIs to create complex aggregation pipelines to get data.
Curating, normalizing, and extracting value from large amounts of data.
Creating automated analysis for the data and visualizing the data.
Using the data to predict outcomes of different topics in the community.
Reporting on the data.
Skills: ETL Tools · Teamwork · SQL · Git · Apache Spark · MySQL · DBT · Docker · Extract, Transform, Load (ETL) · Apache Kafka · Apache Airflow · Pipelines · Data Engineering · Python (Programming Language) · Data Analysis
Data Science & Machine Learning:
Languages & Frameworks: Python, R, TensorFlow, PyTorch, Scikit-Learn, NLTK, spaCy
Techniques: Machine Learning, Deep Learning (CNNs, RNNs), Natural Language Processing, Computer Vision, Statistical Modeling, A/B Testing, Bayesian Analysis
Tools & Libraries: Pandas, NumPy, Seaborn, SciPy, matplotlib, Jupyter
Data Engineering:
Big Data Tools: Apache Spark, Hadoop, Kafka, AirFlow
Databases: PostgreSQL, MySQL, MongoDB, BigQuery, Redshift
Pipeline & Workflow: Apache Airflow, DBT, ETL Processes, Data Collection, Web Scraping
Software Development:
Programming: JavaScript, Bash, PowerShell
Frameworks & Platforms: Django, Node.js, React Native, Flask
Development Practices: MLOps, CI/CD, Docker, Kubernetes, GitHub, Jenkins
Web Development:
Technologies: HTML, CSS, JavaScript
Frameworks: Django, React Native
Tools: Node.js
Visualization & Reporting:
Tools: Tableau, PowerBI, Streamlit, R shiny, Apache Superset
Cloud & Operations:
Platforms: AWS, Google Cloud Platform, Azure
Operations: MLFlow, HyperOpt, TravisCI
Project Management: Agile methodologies, team leadership, project scoping
Communication: Excellent teaching ability, client presentations, detailed technical documentation
Education
10 Academy (July 2021- October 2021)
- Intensive hands-on training and experience in solving real-world/industrial problems using
Data Engineering, and ML Engineering solutions/approaches which involve,
i. Setting up projects' codebase, version control (git, DVC, and MLflow),
ii. Performing Data Exploration Analysis, Feature Extraction, and Pre-processing.
iii. Building data pipelines for ETL using Kafka, and Spark, and scheduling tasks using Airflow.
iv. Developing, Testing, and Maintaining ML Models using different Algorithms.
v. Perform CI/CD using Travis CI, and Compare models using MLflow and DagsHub.
vi. Dockerization, Dashboard presentation, Visualization, and deployment on different platforms,
including Heroku, Streamlit, AWS, etc.
-Learning Mathematical concepts in Calculus, Applied Mathematics, and various fields of application, Statistics and the fields of its application in Data Science. Software development, Web Development, Artificial Intelligence and data Analysis.
i. Application of Mathematics concepts in programming
ii. Software and networking concepts in both Mathematics and computer science.
iii. Web Development and design with Internet application programming.
iv. Databases and their applications.
v. Data structures and Algorithms.
Data Science Micro degree | Udemy | 2022
Post Graduate Program in Machine Learning Engineering/ Data Engineering | 10 Academy | 2021
Statistics Fundamental and Its Application | Udemy | 2021
Complete Data Science Course | 365 Team | 2020
Highlights
Developed a hypothesis-testing algorithm to assess the effectiveness of SmartAd's Brand Impact Optimiser (BIO) service, quantifying the impact of ad campaigns on brand awareness. The analysis revealed a significant lift in brand engagement and memorability, demonstrating the success of SmartAd's creative advertising approach and providing measurable value to clients.
Designed and implemented a causal inference framework using Judea Pearl's methodologies to extract actionable insights from observational data. Successfully inferred and validated causal graphs, merging machine learning with causal inference principles to address complex business questions, enhancing decision-making capabilities.
Using time-series analysis, this project was to predict the sales of a company for two weeks. This is based on the data and the performance of the company in a period of time. The creation of a model for this prediction was very important and understanding how time series analysis works.
This project was meant to create a word cloud to help us identify the words that are highly talked of on Twitter. This was mainly to focus on the Covid-19 as an emerging issue. With this, we are able to identify the words that are spoken of most in an area using Twitter.
This project was for creating a package for Data scientists to use in the analysis of the data of soil for the production of maize. This package is for taking data from the satellite, gathered by Lidar so that the landscape is analyzed for water distribution and the effect that it has is the production of maize in a given area.
A telecommunication company analysis on the performance based on the applications that are being used most and the survival of the company in the future. This includes the recommendations to be improved and more attention to be given to which sector in the company.