Aron Sinkie Gebrie
In today’s rapidly evolving technological landscape, the demand for advanced AI systems capable of handling complex legal tasks is on the rise. Lizzy AI, an early-stage Israeli startup, is at the forefront of this movement, pioneering the development of the next-generation contract AI. In this journey, I embarked on the task of building, evaluating, and improving a Retrieval-Augmented Generation (RAG) system for Contract Q&A, a critical component in the quest for fully autonomous artificial contract lawyers. Lizzy AI aims to revolutionize the legal industry by developing the next generation of contract AI, culminating in a fully autonomous contract lawyer. Lizzy AI is committed to empowering legal professionals and individuals with cutting-edge AI solutions. This project focuses on creating a next-generation legal contract assistant powered by Retrieval-Augmented Generation (RAG) technology.
APPROACH
The key steps of the project include:
Document Loading and Preprocessing:
The system utilized a custom document loader to handle various file formats, including .txt, .pdf, and .docx.
Extracted text was then split into smaller chunks using recursive chunking with an overlap to maintain context across segments. This step is crucial for efficient LLM processing.
Embedding and Storage:
Each text chunk was converted into a vector representation using OpenAI's embedding model.
These embeddings were stored in a vector database (Qdrant) for efficient retrieval during the question-answering process.
Retrieval Strategy:
To retrieve relevant contract documents for a given user question, a Maximum Marginal Relevance (MMR) retrieval approach was implemented within Qdrant.
MMR prioritizes diversity in retrieved documents, ensuring a broader range of perspectives and information is presented in the response.
Response Generation:
Langchain's ConversationalRetrievalChain was employed to generate the final response.
This chain leverages the OpenAI LLM for response creation, while incorporating the retrieved documents and a memory component.
The memory component allows the system to retain context from previous interactions, fostering a more coherent and natural conversation flow.
his project integrates a chat add-on for Redash that uses natural language processing to unlock business intelligence. Users can ask questions about existing dashboards, generate SQL queries, and even create new dashboards, all through chat. This tool empowers non-technical users and streamlines data analysis, making it a valuable addition to the business intelligence (BI) field.
Outcome:
Users can ask questions about existing Redash dashboards in natural language.
Generate SQL queries based on user questions using Large Language Models (LLMs).
Automatically generate visualizations from SQL queries or existing visualizations.
Create new Redash dashboards based on user queries.
Technical Skills and Knowledge
Data Analysis & Visualization
Programming & Development
Proficiency in Python and Javascript (React), Experience with SQL, Understanding complex codebases, Prompt engineering, OpenAI API usage, Add-on/plugin development experience
SQL & Database Management
Experience with Celery for concurrent tasks, Docker and Docker Compose for deployment
Natural Language Processing (NLP)
Vector databases
Machine Learning & AI
Approach
Create Database Schema
Design a schema to efficiently store and query YouTube data (channel performance, user base, video expenses).
Building Redash Chat Add-on
Frontend (React): Design and implement a user-friendly dashboard interface for the add-on.
Backend (Quart - lightweight async Flask): Build a system for data storage and processing based on the designed schema.
LLM Integration
Integrate LLMs (OpenAI or LangChain) to understand and process natural language queries for data insights.
Develop functionality within the add-on to translate natural language queries into actionable SQL queries compatible with Redash.
Automatic Dashboard Generation
Implement functionalities for:
Automatic visualization generation based on user queries, generated SQL queries, or existing visualization context.
Creating new Redash dashboards using a collection of visualizations.