Dr. Ghassemi develops tools and systems that combine human and machine intelligence (A.I.) to solve problems that neither humans nor machines can solve as effectively alone. His research interests include machine learning, data mining, natural language understanding, recommendation systems, decision support, crowd-sourcing, game theory, and human-in-the-loop methodologies. He is especially interested in problem domains where human behavior and judgement must be modeled, and accounted for.
Dr. Ghassemi has authored papers in several highly respected scientific venues including: Nature Scientific Data, Science Translational Medicine, Proceedings of the IEEE, and the Proceedings of the Association for the Advancement of Artificial Intelligence. His work has been featured by venues including the BBC, NPR, The Wall Street Journal and Newsweek. He holds multiple US Patents and has over 10 years of experience working in technical and strategic consulting with several of the world's largest companies : Allstate, Estee Lauder, Thomson Reuters, S&P and Samsung. He currently holds affiliations with Standard & Poor's Financial Services, the Institute for Medical Engineering and Science (IMES) at the Massachusetts Institute of Technology (MIT) and the Ghamut Corporation. He also serves on the advisory board of several startup ventures.
Title: Bridging the Gap: Enhancing LLM Performance for Low-Resource African Languages with New Benchmarks, Fine-Tuning, and Cultural Adjustments
Abstract: Large Language Models (LLMs) have shown remarkable performance across various tasks, yet significant disparities remain for non-English languages, and especially native African languages. This paper addresses these disparities by creating approximately 1 million human-translated words of new benchmark data in 8 low-resource African languages, covering a population of over 160 million speakers of: Amharic, Bambara, Igbo, Sepedi (Northern Sotho), Shona, Sesotho (Southern Sotho), Setswana, and Tsonga. Our benchmarks are translations of Winogrande and three sections of MMLU: college medicine, clinical knowledge, and virology. Using the translated benchmarks, we report previously unknown performance gaps between state-of-the-art (SOTA) LLMs in English and African languages. Finally, using results from over 400 fine-tuned models, we explore several methods to reduce the LLM performance gap, including high-quality dataset fine-tuning (using an LLM-as-an-Annotator), cross-lingual transfer, and cultural appropriateness adjustments. Key findings include average mono-lingual improvements of 5.6% with fine-tuning (with 5.4% average mono-lingual improvements when using high-quality data over low-quality data), 2.9% average gains from cross-lingual transfer, and a 3.0% out-of-the-box performance boost on culturally appropriate questions. The publicly available benchmarks, translations, and code from this study support further research and development aimed at creating more inclusive and effective language technologies.