Uncertainty Quantification in LLM Agents: Foundations, Emerging Challenges, and Opportunities

¹University of Wisconsin-Madison ²Carnegie Mellon University ³Nanyang Technological University ⁴University of Pennsylvania ⁵University of Southern California ⁶University of California, Berkeley
Teaser

Abstract

Uncertainty quantification (UQ) for large language models (LLMs) is a key building block for safety guardrails of daily LLM applications. Yet, even as LLM agents are increasingly deployed in highly complex tasks, most UQ research still centers on single-turn question-answering. We argue that UQ research must shift to realistic settings with interactive agents, and that a new principled framework for agent UQ is needed. This position paper presents three pillars to build a solid ground for future agent UQ research: (1. Foundations) We present the first general formulation of agent UQ that subsumes broad classes of existing UQ setups; (2. Challenges) We identify four technical challenges specifically tied to agentic setups---selection of uncertainty estimator, uncertainty of heterogeneous entities, modeling uncertainty dynamics in interactive systems, and lack of fine-grained benchmarks---with numerical analysis on a real-world agent benchmark, \(\tau^2\)-bench; (3. Future Directions) We conclude with noting on the practical implications of agent UQ and remaining open problems as forward-looking discussion for future explorations.

Introduction

Introduction

LLM agents now take consequential real-world actions like making bookings and modifying databases, where failures go beyond incorrect text generations to include costly or irreversible outcomes. Most existing UQ research treats LLMs as single-turn, static inference systems, which doesn't account for the multi-turn, interactive nature of agentic settings where new information is continuously acquired through interactions. We argue that UQ research must shift toward these realistic agent scenarios.

Foundations

Foundations

We define a stochastic agent as a sequence of actions, observations, and environment states forming a trajectory. Using a dynamic Bayesian network, the joint probability of a trajectory is factorized into per-turn components, enabling total uncertainty to be decomposed as a sum of action and observation uncertainties at each step via the chain rule under some uncertainty instances. This formulation subsumes existing single-step LLM UQ and multi-step reasoning UQ setups as special cases, providing a unified theoretical lens for various UQ setups under one coherent formulation grounded in probabilistic sequential modeling.

Challenges

Practical Implications

Practical Implications

Developing the agent UQ framework is not merely a theoretical exercise but a prerequisite for deploying LLM agents in non-deterministic real environments. We outline implications for frontier LLM research and three specialized domains: healthcare, software engineering, and cyber-physical systems, in which agent UQ may have profound downstream effects, thereby incentivizing policymakers, practitioners, and researchers.

Open Problems

Not only for the technical challenges we can specify immediately, but there is also a bundle of open problems that we need to address for the future of agent UQ: How can we distinguish the source of uncertainty under intrinsic solution multiplcity?; Is it enough to consider the task failure as solely target for uncertainty evaluation?; How should we define and handle the uncertainty dynamics in multi-agent or self-evolving agent systems?

Acknowledgments

We sincerely thank Artem Shelmanov, Shawn Im, Hyeong Kyu Choi, Jongwon Jeong, Eunsu Kim, Mingyu Kim, Ayoung Lee, JungEun Kim, Hayun Lee, Hoyoon Byun, and Kyungwoo Song for their sharp feedback on the initial draft. This work is supported in part by the AFOSR Young Investigator Program under award number FA9550-23-1-0184, National Science Foundation under awards IIS-2237037 and IIS-2331669, Office of Naval Research under grant number N00014-23-1-2643, Schmidt Sciences Foundation, Open Philanthropy, Alfred P. Sloan Fellowship, and gifts from Google and Amazon. Paul Bogdan acknowledges the support by the National Science Foundation (NSF) under the NSF Award 2243104 under the Center for Complex Particle Systems (COMPASS), NSF Mid-Career Advancement Award BCS-2527046, U.S. Army Research Office (ARO) under Grant No. W911NF-23-1-0111, Defense Advanced Research Projects Agency (DARPA) Young Faculty Award and DARPA Director Fellowship Award, Okawa foundation award, National Institute of Health (NIH) under R01 AG 079957, and Intel faculty awards.

BibTeX


          @article{oh2026uncertainty,
            title={Uncertainty Quantification in LLM Agents: Foundations, Emerging Challenges, and Opportunities},
            author={Oh, Changdae and Park, Seongheon and Kim, To Eun and Li, Jiatong and Li, Wendi and Yeh, Samuel and Du, Xuefeng and Hassani, Hamed and Bogdan, Paul and Song, Dawn and Li, Sharon},
            journal={arXiv preprint arXiv:2602.05073},
            year={2026}
          }