Sample Application
Sample Application
Document Loader & Chatbot
We will now introduce our sample application which is a document loader built using LangChain and the Node.js framework. The IDE we used to run this application is WebStorm. The application also integrates HuggingFace API keys for document summarisation.
Description
Our sample application is a valuable tool designed for document processing and summarization. The application is capable to load, analyze, and summarize documents in different formats, including plain Text Files (.txt), Portable Document Format (.pdf), JavaScript Object Notation (.json), and Comma-Separated Values (.csv) files. The application integrates with Natural Language Processing (NLP), leveraging the Langchain framework for data handling and accurate data processing.
By using Hugging Face API Key and a summarisation model, the program is able to generate coherent and insightful summaries for each processed document, allowing users to understand the essential details and analyze large data conveniently without manually reading through large volumes of data. Users can interact with the loader by using the Command-line Interface (CLI) to ask questions and receive responses based on the processed data.
This application definitely works as a valuable tool to highlight key details and extract insights from diverse document types, hence providing a deeper understanding of large-scale data for users. Whether used by researchers, analysts, or professionals in any field, the program serves as an aid for streamlining the process of data comprehension.
- Sofia Batrisyia
Flowchart
This flowchart outlines the flow for our sample application which begins at the Start node and loads the environment variables using dotenv from the .env files. This step sets up the API keys and configurations required for the subsequent steps.
For document loading and processing, the program starts by importing the modules and initialises document loaders for different file types (TXT, PDF, JSON, and CSV). These loaders are responsible for fetching the contents of respective file formats. For each file type, there's a check to see whether if documents are found and they are sent for summarisation. If no documents are found, it stores a "No [type] found" message. It will extract the content of the document according to their types and call Hugging Face API for summarisation and the loaded documents are passed to a summarisation function.
Documents are summarised using a pre-trained model BART-large-CNN model via HuggingFace API calls. The API processes the content and returns the summarised text. Summaries are collected and stored for later use. Then, it will aggregate all the summaries and make them accessible for user interaction.
Subsequently, the program prompts the user to ask questions about the summaries. Based on user input, it displays the relevant summaries. If the user presses Enter or types "no," the program exits; otherwise, the interaction continues, allowing users to ask further questions. After showing the requested summaries, the program returns to prompt the user again. The program terminates after the user chooses to exit the question prompt or if an unhandled error occurs.
Code Explanation
The program starts by importing the required libraries:
langchain modules such as TextLoader, PDFLoader, JSONLoader, and CSVLoader are imported to handle loading TXT, PDF, JSON, and CSV files. These modules enable the program to extract content from various formats for processing.
dotenv reads API keys from the .env file in a secured environment.
fetch is used to make HTTP requests to Hugging Face’s API for text summarisation.
readline provides the program user interaction through the terminal. It can prompt the user for input, such as specifying a file path, choosing a file type, or selecting summarization parameters.
Next, the environment variables are loaded by initialising the dotenv library, to access variables defined in the .env file.
The program retrieves the Hugging Face API key from the .env file using process.env.HUGGINGFACE_API_KEY. This key is essential to authenticate requests made to Hugging Face's text summarisation API.
The program specifies the summarisation model it will use which is facebook/bart-large-cnn. This model processes input text and generates a shorter, more concise version that retains the most important information which makes it suitable for summarising news articles, reports, and other lengthy documents.
The summarizeWithHuggingFace function sends input text to Hugging Face's API to generate a summary. It is used to:
Log the input text and document type for debugging.
Send a POST request to the API with the text in JSON format, including an API key for authentication.
Log the API response status and check for errors, throw an exception if the request fails.
Parse the response JSON to extract the generated summary.
If the request is successful, it will return the summary or an error message if no summary is generated.
This function handles communication, error handling, and result extraction for summarisation.
This summarizeDocs function process and summarise a list of documents individually. This function ensures that each document is processed separately and their summaries are combined into a single result. It:
Initialise an empty string summary to store all document summaries.
Iterate through each document (docs), extracting its content (pageContent) and index.
Call the summarizeWithHuggingFace function to generate a summary for the document.
Format the summary with the document type and index and log it to the console.
Append each formatted summary to the summaries string.
Return the combined summaries of all documents.
This promptUserForQuestions function enables users to ask questions about document summaries and handles their input effectively:
It starts by asking the user, "Ask your question about the summaries (or press enter to exit)."
If the user presses Enter or types "no," the function says "Goodbye!" and stops.
If the user asks about a document summary in their document type, the function responds with the summaries respectively.
If the question doesn’t mention one of these formats, it tells the user to ask about TXT, PDF, JSON, or CSV summaries.
After responding, the function asks another question and repeat the process until the user chooses to exit.
The main function combines the entire workflow which loads the documents, summarises them, stores them and allows the user to ask questions about the summaries.
Finally, the program executes when the main() function is called.
- Kueh Pang Teng
Output
Once the environment is set up, type the command "node index.js" in the console. The program will process all the different types of documents separately and combine the results into a single string to display in the output. The output will display the summaries of all the documents loaded.
The chatbot will prompt the user to ask any questions. Users can type commands such as "txt summaries". The program will extract the content of the document and provide the summary of the TXT file.
If a user inputs "pdf summaries" command, it will display the summaries of the PDF file loaded.
Similarly, if a user inputs "json summaries" command, it will display the summaries of the JSON file loaded.
Last but not least, the command "csv summaries" will display the summaries of the CSV file loaded.
To exit the program, users can input "no" or press Enter and the program terminates execution.
- Anis Syifaa'