How random are Large Language Models?

Questions for you:

When you use an AI tool, how aware are you of the degree to which its outputs are probabilistic rather than deterministic? Does that change how you use or verify what it produces?
Where in your home life or work life might the variability of AI-generated outputs — the fact that the same prompt produces different answers — create problems, and where might it be useful?
Where, for your employer, might the variability of AI-generated outputs — the fact that the same prompt produces different answers — create problems, and where might it be useful? Does your employer’s deployment match that pattern?
If you were deploying an AI system for a specific organisational task, how would you decide where on the determinism-to-randomness spectrum to set it?

Organisational applications:

Randomness as a design choice, not a bug: The story’s most practically useful point for organisations deploying AI tools is that the variability in LLM outputs is not a malfunction to be corrected but a parameter to be configured deliberately. A model set to low temperature behaves more like a consistent reference tool; one set to higher temperature behaves more like a brainstorming partner.

Organisations that treat AI outputs as if they were database queries — expecting identical responses to identical inputs — are likely to be confused and frustrated by the normal behaviour of models. Those who understand the temperature parameter can match the randomness level to the task: low for compliance checking or technical documentation, higher for ideation or draft generation. Most AI deployment decisions currently happen without explicit consideration of this trade-off.

The consistency problem in high-stakes applications: The story notes that pure determinism produces boring text while excessive randomness produces nonsense, and that the sweet spot depends on the application. In organisational contexts, the stakes of inconsistency vary considerably.

A customer service chatbot that gives materially different answers to the same question about a returns policy is a liability; a creative writing assistant that produces varied outputs is doing its job. The discipline required is to assess each AI application explicitly for its consistency requirements before deployment, rather than accepting the default temperature settings of whatever tool is being used. High-stakes, compliance-sensitive, or customer-facing applications generally warrant lower temperature than internal creative or exploratory uses.

LLMs as prediction engines, not knowledge stores: The story’s description of LLMs as sophisticated prediction engines that use controlled randomness to avoid the most statistically likely patterns is important for setting appropriate expectations. Organisations that deploy AI tools as if they were searchable databases of correct answers will encounter outputs that are fluent but wrong — a known failure mode that follows directly from the probabilistic architecture the story describes.

Understanding that LLMs generate the most contextually plausible continuation of a prompt, rather than retrieving a verified fact, shapes how outputs should be used and verified. The randomness is not incidental; it is the mechanism by which these systems avoid collapsing into repetitive formulaic outputs, and it means that confident-sounding text carries no inherent reliability guarantee.

Further reading

On how LLMs work and the role of randomness:

The Alignment Problem: Machine Learning and Human Values by Brian Christian. Christian’s account of how machine learning systems are built and trained provides the technical context for the temperature story without requiring a mathematical background, and covers the broader questions about what these systems are actually doing when they generate text.

You Look Like a Thing and I Love You by Janelle Shane. A more accessible and often funny account of how neural networks generate outputs, including illuminating examples of what happens when the randomness dial is turned up too high — directly relevant to the story’s observation about the nonsense end of the temperature spectrum.

On probability, prediction, and what AI systems can and cannot do:

The Signal and the Noise: The Art and Science of Prediction by Nate Silver. Silver’s account of what good and bad prediction looks like across multiple domains provides useful context for understanding LLM outputs as probabilistic predictions rather than verified facts.

Noise: A Flaw in Human Judgement by Daniel Kahneman, Olivier Sibony and Cass R. Sunstein. The chapters on algorithmic versus human judgment are relevant here: the book’s argument that consistent algorithmic outputs often outperform variable human judgment sits in interesting tension with the story’s observation that some variability in AI outputs is deliberate and functional.

On deploying AI tools in organisational contexts:

Power and Prediction: The Disruptive Economics of Artificial Intelligence by Ajay Agrawal, Joshua Gans, and Avi Goldfarb. The authors frame AI as a prediction machine that reduces the cost of prediction — directly aligned with the story’s technical description — and draw out the organisational implications of deploying systems that provide probabilistic rather than deterministic outputs.

About the image

A parrot I spotted in, of all places, the Isle of Wight.

Photo montage and photo by Matt Ballantine, 2026

Random the Book

Random the Book: Matt Ballantine and Nick Drage's experiment in serendipity and chance.