When is there enough data?

Questions for you:

Can you think of a recent decision in your organisation that was delayed waiting for more data, where the additional data did not materially change the eventual decision? How can you avoid that in future?
How does your organisation currently think about the cost of gathering more information, as opposed to the cost of acting on what is already known? Is that optimal, and if so, how can you change that?
Are there decisions in your work where the asymmetry between the cost of a false positive and a false negative is so stark that one data point should be sufficient? Are you sure?

Organisational applications:

The cost of information has to be weighed against its value: The E. coli example illustrates a decision framework that most data-collection processes ignore: the value of additional information is not the reduction in uncertainty it provides in isolation, but that reduction relative to the cost of obtaining it and the cost of the decision error it prevents.

When the downside of acting on incomplete data is low and reversible, it is worth collecting more data. When the downside is catastrophic and irreversible, the first strong signal should often be sufficient. Most organisational data-collection norms were developed for the first case and are applied indiscriminately to the second. The discipline of explicitly asking “what decision would additional data change, and what is the cost of that change compared to the cost of waiting?” is more useful than a blanket commitment to data sufficiency.

Analysis paralysis as a misapplication of data culture: The page identifies a specific failure mode: treating the mantra of data-driven decision-making as an argument for indefinite data collection rather than as a principle governing decision-making. In practice, the pressure to collect more data before deciding is often less about genuine uncertainty reduction and more about distributing accountability — if the decision goes wrong, the data collection process provides cover.

Organisations that mistake this behaviour for rigour end up slower and no more accurate than those that make explicit, time-bounded decisions based on available information. Can your organisation be more considered in not just asking “do we have enough data?” but “what is the actual decision, what data would change it, and is that data available within the time we have?”

Asymmetric cost structures change what counts as enough: the story’s framework implies a simple yet often-ignored principle: the threshold for action should vary with the asymmetry in error costs. In food safety, a false negative — continuing to use contaminated ingredients — is catastrophically worse than a false positive — discarding a clean delivery. The appropriate data threshold in that context is therefore very low.

The same logic applies to safety decisions, reputational risks, and situations in which delay itself causes harm. Organisations that apply the same evidentiary standard to all decisions regardless of this asymmetry are systematically over-collecting data in some situations and under-weighting risk in others. Make the cost asymmetry explicit when framing a decision is a simple structural improvement that most organisations do not currently build into their processes.

Further reading

On decision-making under uncertainty and the value of information:

Thinking in Bets: Making Smarter Decisions When You Don’t Have All the Facts by Annie Duke. Duke’s framework for making explicit probability estimates rather than waiting for certainty is directly relevant: the question is always what decision the next piece of data would change, not whether certainty has been achieved.

Superforecasting: The Art and Science of Prediction by Philip Tetlock and Dan Gardner. Tetlock’s account of how good forecasters update on evidence incrementally rather than waiting for definitive data covers the same conceptual ground from a forecasting angle.

On satisficing, thresholds, and when good enough is optimal:

The Paradox of Choice: Why More is Less by Barry Schwartz. Schwartz’s account of the costs of optimising rather than satisficing is relevant here: the search for more data is often a variant of the search for the best possible option, with similar psychological and practical costs.

Algorithms to Live By: The Computer Science of Human Decisions by Brian Christian and Tom Griffiths. The chapters on optimal stopping and the explore/exploit trade-off provide the formal framework for understanding when to stop gathering information and act.

On risk asymmetry and precautionary decision-making:

The Black Swan: The Impact of the Highly Improbable by Nassim Nicholas Taleb. Taleb’s argument for asymmetric caution — being aggressive about low-cost, recoverable risks and highly conservative about catastrophic, irreversible ones — is the formal version of the E. coli logic.

Noise: A Flaw in Human Judgement by Daniel Kahneman, Olivier Sibony and Cass R. Sunstein. The chapters on decision rules and when to replace judgment with algorithms cover the organisational case for pre-specifying data thresholds rather than making case-by-case calls about sufficiency.

About the image

I’m old enough to remember when a “Library of Congress” was a measure of a large volume of data in the way that “Wales” is a measure of a small country.

Photo montage by Matt Ballantine, 2026, Photo Public Domain

Random the Book

Random the Book: Matt Ballantine and Nick Drage's experiment in serendipity and chance.