Daisy Cherono
Nairobi, Kenya
Jomo Kenyatta University of Agriculture and Technology (2016-2022)
Email: jepchumbadaisy96@gmail.com
Apache Airflow
SQL
PostgreSQL
ETL/ELT
Python
React
HTML5
JavaScript
Streamlit
Unit Testing
Scikit-Learn
PyTorch
MLflow
Fine-tuning
Embedding
Tokenization
RAG Pipelines
LLMs
About me
Junior Generative AI Engineer with a passion for developing RAG systems and fine-tuning LLMs to enhance data-driven decision-making. Strong background in software engineering with skills in Python, Javascript, SQL, React, vector databases, and ML frameworks like ScikitLearn and PyTorch. Passionate about creating intelligent solutions that enhance data-driven decision-making.
Education
Software Engineering
Curriculum: Low-level programming in C, Higher-level programming in Python, Fundamentals of system
design, SQL, Bash Scripting, Rest APIs, React, Redux, Redux Toolkit
Relevant Curriculum: Computer Programming in C++, GIS Programming in Python, Database Systems, Javascript
Work Experience
Built a precision RAG pipeline that automated prompt generation, test dataset creation, and ranking services. Created a vector store using Pinecone for efficient data retrieval and used OpenAI GPT-3.5 Turbo for querying. Deployed the application using Flask for seamless integration and service.
Built a Contract Q&A RAG system using LangChain and OpenAI to improve the precision of answers from querying contracts using LLMs. Applied advanced retrieval methods, enhancing the system's performance.
Fine-tuned Llama2-7B model with Swahili news data for Swahili text classification in 5 categories, contributing to better language processing for underrepresented languages.
Collected and processed data from Swahili news websites through web scraping and from Hugging Face repositories, resulting in a comprehensive high quality dataset with 54k rows of data in total.
Streamlined data analysis and visualization by setting up Redash using Docker and adding a chatbot to translate natural language queries into SQL commands, facilitating better data-driven decision-making.
Improved data quality and analysis reliability by conducting EDA, data cleaning, feature engineering, and scaling.
Identified causes of unfulfilled orders by performing causal inference on parcel delivery data, providing insights for logistic optimization.
Enhanced user engagement by developing a user-friendly and responsive landing page for an events booking platform web application using Typescript, Next.js 13, and Tailwind CSS, ensuring seamless functionality across devices.
Achieved seamless integration of designs into frontend development by collaborating closely with UX/UI designers
Improved infrastructure mapping accuracy by performing on-site data acquisition using handheld GPS, generating precise geographic information for the company's pipeline and sewer line network.
Updated the company's water meter database with new spatial data from new installations, contributing to better operational oversight.
Projects
Increased the precision and quality of answers by building a Q&A RAG with LangChain, OpenAI and Chroma DB that users can use to retrieve information from contract documents.
Collaborated in a team of 5 to fine-tune a Llama2-7B model with Swahili news data for Swahili text classification in 5 categories, contributing to better language processing for underrepresented languages.
Streamlined data analysis and visualization by setting up Redash using Docker and adding a chatbot to translate natural language queries into SQL commands, facilitating better data-driven decision-making. Leveraged individual strengths by working in a team of 2.
Improved efficiency of RAG pipeline by automating prompt generation, test dataset creation, and ranking services. Created a vector store using Pinecone for efficient data retrieval and utilized OpenAI GPT-3.5 Turbo for querying. Deployed the application using Flask for seamless integration and service.