Daisy Cherono
A contract Q&A RAG built with LangChain, OpenAI and Chroma DB that users can use to ask questions about contract documents.
Increased the precision and quality of contract query answers by building a Contract Q&A RAG system using LangChain and OpenAI.
Measured the RAG’s performance using metrics such as context precision, context recall, answer relevancy, and faithfulness from RAGAS.
Improved the performance metrics of the RAG by implementing a data chunking strategy with RecursiveCharacterSplitter. Found that 500 chunk sizes had the best results over 1000 and 128 chunks.
Tested LangChain’s advanced retrieval methods, including Multiquery, Ensemble, and ParentDocument Retriever, to optimize performance and ultimately built the RAG with the Multiquery Retriever which had the best performance.
Developed a user-friendly interface with Streamlit, allowing users to efficiently interact with the RAG system.
In a team of 5, we fine tuned Bert and Llama 2 7B model with Swahili news data for Swahili text classification in 5 categories.
Fine-tuned the Llama2 13B quantized model for Swahili text classification in 5 categories, including sports, entertainment, business, local news, international news, and health. As a result, the project contributes to better language processing for Swahili, an underrepresented language.
Collected and combined 54k rows of Swahili news data through web scraping and from Hugging Face and African NLP datasets.
Evaluated model performance on text generation, classification, summarization, question answering, and translation before fine-tuning.
Preprocessed the data and utilized SFTTrainer for supervised fine-tuning, pushing the final model to Hugging Face.
Leveraged individual strengths by working in a team of 5. My role involved data collection, preprocessing, and performing inference on the Llama2 13B quantized model.