This document defines a structured interaction protocol designed to increase the reliability and transparency of large language model (LLM) outputs through enforced self-correction.
The protocol does not alter the model.
It alters the structure of engagement.
Its purpose is to:
Make claims inspectable
Expose assumptions
Surface meaningful counterarguments
Align confidence with evidence
Reduce overconfidence and narrative completion bias
The protocol improves transparency.
It does not guarantee truth.
The protocol is intended for use when:
Accuracy materially affects decisions
The subject is politically or emotionally charged
The model’s answer appears overly confident
Framing bias is suspected
The output will influence judgment
It is not required for trivial or low-stakes queries.
The protocol should scale with consequence.
Model Output — The AI’s initial response to a query.
Claim — A statement presented as fact and capable of verification.
Inference — A conclusion derived from one or more claims.
Speculation — A hypothesis, projection, or possibility not yet supported by verified evidence.
Transparency — The degree to which reasoning can be inspected and evaluated.
Recalibration — Adjustment of stated confidence after critique.
The protocol consists of three mandatory stages and one optional stage.
Prompt:
List the key factual claims in your previous answer as numbered statements. Separate facts from opinions. Do not summarize.
Objective
Extract discrete claims
Separate fact from inference
Prevent narrative blending
Expected Outcome
Numbered list of claims
Clear category separation
No paraphrased restatement of the original answer
If claims cannot be clearly extracted, transparency is low.
Prompt:
Construct the strongest argument against your previous answer. Be rigorous and avoid strawman arguments.
Objective
Surface hidden assumptions
Identify evidentiary weaknesses
Present alternative interpretations
Interrupt premature closure
Expected Outcome
Substantive critique
Identification of structural vulnerabilities
Plausible competing explanations
Superficial critique indicates fragility.
Prompt:
After considering the counterargument, re-evaluate your confidence in your original answer. Provide a percentage and explain your reasoning.
Objective
Align confidence with evidence strength
Measure responsiveness to critique
Detect rigidity
Expected Outcome
Stated percentage
Explanation tied to specific weaknesses
Logical adjustment when warranted
Unchanged high confidence after serious critique is a warning signal.
Prompt:
List the unstated assumptions your original answer relied upon.
Use in high-stakes or analytical contexts where hidden premises materially affect conclusions.
After completing the loop, categorize the output.
Clear claim separation
Serious adversarial reasoning
Proportional confidence adjustment
Partial claim clarity
Limited adversarial depth
Minimal recalibration
Blended claims
Weak or defensive rebuttal
Rigid or unjustified confidence
Transparency rating evaluates reasoning visibility — not factual correctness.
If transparency is low or confidence remains high despite substantive critique:
Re-run Stage 1 with stricter instruction
Require citation-backed claims
Narrow the scope of the original question
Separate complex questions into discrete sub-questions
Escalation is appropriate when the output will influence consequential decisions.
This protocol is designed to expose:
Confident completion of under-specified tasks
Blending of fact and inference
Narrative completion bias
False precision
Artificial symmetry in evidence presentation
Defensive self-critique
Rigid confidence under challenge
Repeated detection of these patterns reduces reliability.
The protocol improves inspectability, not omniscience.
It does not:
Eliminate hallucinations
Replace independent verification
Convert probabilistic reasoning into certainty
Transfer responsibility to the model
The system may produce persuasive self-critique without new evidence.
Confidence shifts may reflect rhetorical framing rather than empirical strength.
Final judgment remains human.
The Self-Correction Reliability Loop introduces deliberate friction:
Initial Answer
→ Claim Extraction
→ Adversarial Rebuttal
→ Confidence Recalibration
→ Transparency Assessment
The loop slows acceptance and exposes reasoning structure.
Reliability improves when confidence survives scrutiny.
Responsibility remains with the user.