Large language models (LLMs) are widely used in NLP applications, but their tendency to produce hallucinations poses significant challenges to the reliability and safety, ultimately undermining user trust. This tutorial offers the first systematic introduction to uncertainty quantification (UQ) for LLMs in text generation tasks -- a conceptual and methodological framework that provides tools for communicating the reliability of a model answer. This additional output could be leveraged for a range of downstream tasks, including hallucination detection and selective generation.
We begin with the theoretical foundations of uncertainty, highlighting why techniques developed for classification might fall short in text generation. Building on this grounding, we survey state-of-the-art white-box and black-box UQ methods, from simple entropy-based scores to supervised probes over hidden states and attention weights, and show how they enable selective generation and hallucination detection. Additionally, we discuss the calibration of uncertainty scores for better interpretability.
A key feature of the tutorial is practical examples using LM-Polygraph, an open-source framework that unifies more than a dozen recent UQ and calibration algorithms and provides a large-scale benchmark, allowing participants to implement UQ in their applications, as well as reproduce and extend experimental results with only a few lines of code.
By the end of the session, researchers and practitioners will be equipped to (i) evaluate and compare existing UQ techniques, (ii) develop new methods, and (iii) implement UQ in their code for deploying safer, more trustworthy LLM-based systems.
Introduction
LM-Polygraph
Theoretical background
Unsupervised UQ part 1: Information-based, Consistency-based and Verbalized Uncertainty
5. Unsupervised UQ part 2: Introspective methods
6. Supervised UQ methods
7. Uncertainty normalization & calibration
8. Benchmarking UQ methods
9. Conclusion & future work
@inproceedings{shelmanov_uncertainty,
title = {Uncertainty Quantification for Large Language Models},
author = {Shelmanov, Artem and Panov, Maxim and Fadeeva, Ekaterina and Vazhentsev, Artem and Vashurin, Roman and Baldwin, Timothy},
year = {2025},
booktitle={Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics}
}