Chatty AI
Introduction
Historically—especially over the last few hundred years, but stretching back in very narrow groups to antiquity—cultures have developed learning and assessment processes involving the writing and reading of linear texts, where the learner has little control over the design of the interaction. To be sure, the reader can look at a table of contents or an index, and make selections, and have some control over the sequence of study and exploration. But for all human beings, for as long as there have been cognitively modern human beings, there is a much more basic, species-wide process: interactive conversation. Two or more people interact, and each has a role in managing the discourse. They can chat. They can ask questions, get a response, and as a consequence of that response, generate another question. This is how all children begin and how all adults proceed, although some of them augment Some teachers and students have become interested in the extent to which foundation models, such as Large Language Models, can serve as a platfor for making AI that supports this interactive process of learning.
Red Hen Lab (http://redhenlab.org) is running a project in Google Summer of Code 2024 to build a Chatty AI to answer questions about conceptual frames and construction grammar. These are areas of research in cognitive science and cognitive linguistics. For details on the Red Hen Lab Chatty AI 2024 project, see
https://www.redhenlab.org/summer-of-code/red-hen-lab-gsoc-2024-ideas
https://www.redhenlab.org/summer-of-code/red-hen-lab-gsoc-2024-projects
Creating a dedicated chatbot based on a Large Language Model might require
Training a model from scratch. This requires serious compute and takes a vast amount of data.
Fine-tuning an existing model. This adjusts the LLM's internal parameters to perform better on a specific task. Imagine training a customer service chatbot on a dataset of past interactions. Fine-tuning strengthens the LLM's ability to understand and respond to customer queries within that domain.
Adding a RAG module. RAG=Retrieval-Augmented Generation. This approach adds a layer where the AI retrieves information from external sources (like your PDFs) in real-time. During a conversation, RAG searches the PDFs for relevant info and feeds it to the LLM, which then incorporates it into the response.
Here's an example of a chatbot trained from scratch to answer questions about a Yale professor's research: https://news.yale.edu/2024/04/16/student-developed-ai-chatbot-opens-yale-philosophers-works-all
If you want to develop a chatbot that can chat about specific works, you can use an existing model, and fine-tune it or add RAG or both. Consider the advantages of a hybrid approach: Fine-tune the LLM for a base understanding of your domain (e.g., construction grammar) and use RAG for retrieving specific details from the PDFs. This combines the strengths of both techniques.
RAG is generally stronger than fine-tuning on accuracy. Fine-tuning can be accurate, but requires rigorous training to avoid hallucinations. RAG is generally simpler to implement. You mainly configure retrieval and instruct the model. Fine-tuning involves retraining the LLM on your PDFs, which requires more technical expertise. RAG is also more flexible. You can easily add new PDFs to the knowledge base without retraining. Fine-tuning requires retraining whenever the PDFs change significantly.
But the beginner who wants to chat with a group of pdfs might not want to begin with the approaches. There are existing services that make it possible to chat with PDFs.
ChatGPT4o allows the user to use a GPT like AI PDF to upload pdfs and then use ChatGPT4o to "chat" with the pdfs.
Microsoft Copilot has a pdf upload feature. https://copilot.microsoft.com
Google Gemini Enterprise allows the user to upload pdfs to Google Cloud and set the permissions such that Google Gemini Enterprise can chat with the pdfs.
Copilot for Microsoft 365 allows such upload.
Faculty, staff, and students at Case Western Reserve University have, provisionally, access to the CWRU Azure AI Portal. https://aka.ms/cwru-portal. "This beta portal is hosted by the university. Data entered into the portal are not shared outside the university. Currently, this portal does not meet regulatory requirements such as HIPAA, so data with those requirements should not be entered into the service. UTech plans to release a production version of this service which may meet some compliance requirements. Check back here for updates. Currently, there is no charge for this service. A chargeback model may apply when it is offered as an official campus service" (https://case.edu/utech/AI).
Lamini Memory Tuning: https://www.lamini.ai/blog/lamini-memory-tuning & https://youtu.be/Bs36gxpKcqk?si=cTCqKU0YHDm39PID
A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models. https://arxiv.org/abs/2401.01313. This paper summarizes 32 techniques to mitigate hallucination in LLMs and introduces a taxonomy categorizing methods like RAG, Knowledge Retrieval, CoVe, and more… It also provides tips on how to apply these methods and highlights the challenges and limitations inherent in them.
MTEB English Leaderboard. https://huggingface.co/spaces/mteb/leaderboard Good to know for any RAG experiments: a huge compilation of models used to generate embeddings and their scores for different tasks on different datasets
Examples from Professor Vipin Chaudhury
Here is a chat for code generation, etc.
Link to Chat: https://chat.openai.com/share/d3cbdf3b-c223-4b8d-b904-053e8f5ba1cf
Link to Google Colab: https://colab.research.google.com/drive/160LSzH5zYJJNcpUQrSqYD6w-jffYhsvi?usp=sharing
A Google Doc with some more data - creating synthetic medical images, etc.:
https://drive.google.com/drive/u/0/folders/1pKRMF1BbofeNvfp0eo3sPoeh27GoHTFF