Existing LLMs hallucinate on simple questions, including powerful LLMs (Google Gemini, for example, Gemini Hallucination).
Snap LLMs out of hallucination by augmenting LLMs with astronomy knowledge with the aid RAG (Retrieval Augmented Generation) and proper prompt engineering.
RAG cannot cover scope of a question always either. Prompt engineering is, well, engineering, which LLMs may not response to or adapt to correctly always.
Published 2023
LLM responses follow the English instructions (prompt engineering) to provide outputs in JSON. Though at times instruction may not be followed by LLMs.
Response in this case is mostly incorrect (so-called hallucinated response).
<s>[INST] <<SYS>>
Your answers must not include any derogatory content of any sort.
Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct.
If you don't know the answer to a question, please don't share false information.
Your output must be in valid JSON.
Do not output anything other than the JSON and do not repeat more than once.
Add the outermost key "einsteinRing" to the JSON you output.
Add the QUESTION to the JSON with the key "prompt" under "einsteinRing".
Add under the "einsteinRing" key "description" key for the description of Einstein Ring.
Add "synonyms" key under the "einsteinRing" key for the list of other synonymous terms for the Einstein Ring.
Add "listofKnownER" key for the list of known Einstein Rings.
You must surround your JSON output with <AIResponse></AIResponse> tags.
<</SYS>>
QUESTION: What is an Einstein Ring? What are the other names for the Einstein Ring? List a few known Einstein Rings that were discovered. [/INST]
Number of Tokens in Input: SYS+Context+Question: 323 (as calculated with Python code)
AI Output Start: ----------------------------------------------------
<AIResponse>
{
"einsteinRing": {
"prompt": "What is an Einstein Ring? What are the other names for the Einstein Ring? List a few known Einstein Rings that were discovered.",
"description": "An Einstein Ring, also known as an gravitational lens, is a rare astronomical phenomenon where the light from a distant source is bent by the gravitational field of a foreground mass, creating a ring-like image around the mass. The other names for the Einstein Ring include gravitational lens, gravitational arc, and cosmic ring.",
"synonyms": [
"gravitational lens",
"gravitational arc",
"cosmic ring"
],
"listofKnownER": [
"The first known Einstein Ring was discovered in 1979 around the galaxy NGC 4151",
"The most famous Einstein Ring is the one around the galaxy cluster CL0024+17",
"In 2019, a team of astronomers discovered an Einstein Ring around the galaxy PSS J0822+1307"
]
}
}
</AIResponse>
Note: The above response is written in JSON format with the required keys and values as specified in the system. The "einsteinRing" key contains the information about the Einstein Ring, including its description, synonyms, and a list of known Einstein Rings. The "listofKnownER" key contains a list of known Einstein Rings that have been discovered.
llama_print_timings: load time = 7209.92 ms
llama_print_timings: sample time = 102.60 ms / 369 runs ( 0.28 ms per token, 3596.56 tokens per second)
llama_print_timings: prompt eval time = 64479.87 ms / 322 tokens ( 200.25 ms per token, 4.99 tokens per second)
llama_print_timings: eval time = 146646.98 ms / 368 runs ( 398.50 ms per token, 2.51 tokens per second)
llama_print_timings: total time = 212504.36 ms / 690 tokens
Knowledge document (sentence) embeddings were saved in FAISS vector DB.
HTML document sources (from Wiki) were filtered on <p> tag (using Selenium Python).
Context was generated with similarity search on QUESTION against the vector DB.
Response is correct based on the knowledge context provided. But it missed the synonym Gravitational Lens.
Response quality depends on 1) input token length limit (4096 in this case), 2) Similarity search could have "cast it wider" for the search space covered (RAG limitation).
<s>[INST] <<SYS>>
Your answers must not include any derogatory content of any sort.
Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct.
If you don't know the answer to a question, please don't share false information.
Your output must be in valid JSON.
Do not output anything other than the JSON and do not repeat more than once.
Add the outermost key "einsteinRing" to the JSON you output.
Add the QUESTION to the JSON with the key "prompt" under "einsteinRing".
Add under the "einsteinRing" key "description" key for the description of Einstein Ring.
Add "synonyms" key under the "einsteinRing" key for the list of other synonymous terms for the Einstein Ring.
Add "listofKnownER" key for the list of known Einstein Rings.
You must surround your JSON output with <AIResponse></AIResponse> tags.
You must answer the QUESTION based on the CONTEXT given, not based on prior knowledge.
<</SYS>>
CONTEXT: ... (gleaned from wiki page with Python code)
QUESTION: What is an Einstein Ring? What are the other names for the Einstein Ring? List a few known Einstein Rings that were discovered. [/INST]
Number of Tokens in Input: SYS+Context+Question: 1658
AI Output Start: ----------------------------------------------------
<AIResponse>
{
"einsteinRing": {
"prompt": "What is an Einstein Ring?",
"description": "An Einstein Ring is a rare astronomical phenomenon where light from a distant star is bent by the gravitational field of a massive object, creating a ring-like structure. It is also known as an Einstein-Chwolson ring or Chwolson ring.",
"synonyms": ["Einstein-Chwolson ring", "Chwolson ring"],
"listofKnownER": [
{
"name": "B1938+666",
"description": "The first complete Einstein ring to be discovered, caused by an ancient elliptical galaxy lensing a dark dwarf satellite galaxy."
},
{
"name": "PKS 1830-211",
"description": "An Einstein ring around a quasar lensed by an almost face-on spiral galaxy, notable for being the first case of a quasar being lensed by an almost face-on spiral galaxy."
},
{
"name": "MG1654+1346",
"description": "A radio ring caused by a quasar radio lobe lensed by a galaxy."
}
]
}
}
</AIResponse>
Note: The above response is in valid JSON format, with the "einsteinRing" key containing the information about the Einstein Ring, including its description, synonyms, and a list of known Einstein Rings discovered so far. The "listofKnownER" key contains a list of known Einstein Rings, each with a name and description of the ring.
llama_print_timings: load time = 29075.96 ms
llama_print_timings: sample time = 254.82 ms / 423 runs ( 0.60 ms per token, 1660.00 tokens per second)
llama_print_timings: prompt eval time = 592741.85 ms / 1657 tokens ( 357.72 ms per token, 2.80 tokens per second)
llama_print_timings: eval time = 5254369.95 ms / 422 runs (12451.11 ms per token, 0.08 tokens per second)
llama_print_timings: total time = 5854228.06 ms / 2079 tokens
Response (March 2024) is similar to above to certain extent. But listOfKnownER does not seem right ("SDSSJ1430+4550", "J2237-0602").
Is there anything called partial hallucination? Even a little bit of hallucination can be dangerous in real life case.
<AIResponse>{ "einsteinRing": { "prompt": "What is an Einstein Ring? What are the other names for the Einstein Ring? List a few known Einstein Rings that were discovered.", "description": "An Einstein ring, also known as an Einstein–Chwolson ring or Chwolson ring, is a rare astronomical phenomenon that occurs when light from a distant source, like a galaxy, is bent by the gravity of a massive object in the foreground, like another galaxy or a black hole, creating a perfectly circular image of the source around the lens. This is a spectacular confirmation of Einstein's theory of general relativity, which predicted that gravity bends light.", "synonyms": [ "Einstein–Chwolson ring", "Chwolson ring" ], "listofKnownER": [ "B1938+666", "SDSSJ1430+4550", "J2237-0602" ] } } </AIResponse>
llama-cpp-python within LangChain
Selenium with Python was used to filter HTML sources
FAISS Vector DB was used for knowledge augmentation generation (Meta, Huggingface, Langchain)
HuggingFace sentence_transformers embedding
Sentence transformer model: sentence-transformers/all-mpnet-base-v2
Knowledge sources:
Wikipedia: Einstein ring, Gravitational Lens
Author: MZH
Image: https://esahubble.org/images/potw2050a/