Even with retrieval, the model can still guess details. Generation needs to be controlled so the answer is grounded in the retrieved text and traceable back to the spec.
Libraries Used
transformers (Hugging Face): Runs the open source LLM locally (Qwen 2.5 3B,1.5B,0.5B).
torch (PyTorch): Needed to run the model and generation on CPU or GPU.
json: Read retrieval output and save generation output.
re (regular expressions): Parse the model output into structured fields.
argparse: Run generator from command line with settings.
bitsandbytes: used for 4-bit loading when CUDA is available.
User question
Retrieved chunks from BM25 including chunk ids and page ranges
The generator is prompted to return responses in a strict format:
Answer
If evidence is weak or missing, the system should return:
Answer: Not enough information
Current workflow:
If top BM25 score is below a cutoff, the system stops and returns “Not enough information”
If the score is above the cutoff, retrieved chunks are placed into the prompt context
The generator is asked to answer using only this context
Output is saved as structured JSON along with retrieval metadata
Generator runs end to end on retrieved context
Outputs are saved in a structured JSON format for evaluation and debugging
Early observation: smaller open models sometimes abstain even when evidence exists, so prompt tuning is ongoing
Improve prompting so the model answers when evidence is clearly present
Output should include:
Evidence (quotes copied from retrieved text with chunk id and page range)
Confidence (High, Medium, Low)
Compare multiple generator models as required
Add a stronger support check step later (judge or NLI) to verify evidence support
Result Example:
{
"question": "How does the network initiate the paging procedure?",
"model_used": "Qwen/Qwen3.5-0.8B",
"abstained_precheck": false,
"prompt_preview": "You are a QA assistant for 3GPP TS 38.331.\nUse ONLY the context below.\n\nRule:\n- If you can find a sentence in the context that answers the question, you MUST answer.\n- Only say \"Not enough information\" if the context has no sentence that answers the question.\n- Do not use placeholders like <direct quote 1>. Use real quotes copied from the context.\n\nReturn EXACTLY this format:\n\nAnswer: <1 to 3 sentences>\nEvidence:\n- \"<quote 1>\" (chunk_id=..., pages=...)\n- \"<quote 2>\" (chunk_id=..., pages=...) [optional]\nConfidence: High/Medium/Low\n\nQuestion:\nHow does the network initiate the paging procedure?\n\nContext:\n[1] chunk_id=chunk_000072 pages=81-82 score=19.7694\nwith an SCG with an sk-Counter even when no DRB is setup using the secondary key (SK ) in order to allow the configuration of SRB3. The network can also provide the UE with an sk-Counter, even if no gNB SCG is configured, when using SN terminated MCG bearers. 5.3.2 Paging 5.3.2.1 General !\" #$%&'E) !\"#$%# Figure 5.3.2.1-1: Paging The purpose of this procedure is: - to transmit paging information to a UE in RRC_IDLE or RRC_INACTIVE. - to transmit paging information for a L2 U2N Remote UE in RRC_IDLE or RRC_INACTIVE to its serving L2 ...",
"answer": "The network initiates the paging procedure by transmitting the Paging message at the UE's paging occasion as specified in TS 38.304 [20], which may include PagingRecord(s) or TMGI(s) to page specific UEs. Upon receiving this message, the UE forwards the UE-Identity, accessType, and paging cause to the upper layers, and in RRC_IDLE, it forwards the UE-Identity and accessType to the upper layers. If the UE is configured with Access Identity 1, it initiates the RRC connection resumption procedure with resumeCause set to mpsPriorityAccess; otherwise, it initiates the procedure with resumeCause set to mt-Access.",
"evidence": [],
"confidence": null,
"raw_model_output": "Answer: The network initiates the paging procedure by transmitting the Paging message at the UE's paging occasion as specified in TS 38.304 [20], which may include PagingRecord(s) or TMGI(s) to page specific UEs. Upon receiving this message, the UE forwards the UE-Identity, accessType, and paging cause to the upper layers, and in RRC_IDLE, it forwards the UE-Identity and accessType to the upper layers. If the UE is configured with Access Identity 1, it initiates the RRC connection resumption procedure with resumeCause set to mpsPriorityAccess; otherwise, it initiates the procedure with resumeCause set to mt-Access.",
"retrieval_meta": {
"top_k": 5,
"min_top_score": 2.0,
"retrieval_json": "retrieval_results.json"
}
}