Random the Book

Random the Book: Matt Ballantine and Nick Drage's experiment in serendipity and chance.


How reliable are your backups?

Questions for you:

  • When evaluating backup strategies, do you understand that the probability of multiple independent failures occurring simultaneously decreases exponentially with each additional backup?
  • Looking at critical data protection, do you rely on a single backup method, or recognise that multiplying small independent probabilities creates large security benefits?
  • When assessing backup reliability, do you verify backups are truly independent (geographically distributed, different storage types) rather than assuming redundancy provides protection?
  • In calculating risk, do you understand that adding a fourth backup doesn’t just add protection – it multiplies exponentially from one in a million to one in 100 million?

Organisational applications:

Exponential protection through independent backups: Single hard drive: 1% annual failure rate. Three independent drives: 0.01 × 0.01 × 0.01 = 0.000001 probability all fail in the same year (one in a million). Fourth backup: one in 100 million. Each additional independent backup multiplies protection exponentially, not additively. This is why security professionals obsess about redundancy and why truly critical data is stored in multiple copies across multiple locations. Key word: independent. Failures must be unconnected for multiplication to work.

Geographic distribution eliminates correlated failures: Cloud services create multiple copies across geographically distributed data centres – the probability of simultaneous natural disasters, power failures, or equipment malfunctions affecting all locations approaches zero. Financial institutions store transaction records in New York, London, and Tokyo simultaneously. A single fire doesn’t destroy all backups. Geographic distribution ensures failures remain independent rather than correlated (an earthquake affecting a single city destroys all backups if they’re co-located).

Verify independence of backup methods: Hardware failures are random (thermal stress, power fluctuations, manufacturing defects, cosmic radiation) but only if truly independent. Backups on the same rack: not independent (single power surge). Backups in the same building: not independent (single fire). Backups from the same manufacturer with the same firmware: potentially not independent (systematic flaw). Verify: different physical locations, different hardware vendors, different storage types (disk, tape, cloud), different power sources, different cooling systems. The independence assumption fails catastrophically if backups share failure modes.

Calculate actual protection levels: Multiplying small probabilities creates large security benefits, but requires an honest assessment of failure rates and independence. One backup: 99% reliability. Two independent backups: 99.99% reliability. Three: 99.9999% reliability. Each additional backup adds two nines. But if backups aren’t independent, the calculation doesn’t hold. Audit: What failure modes affect multiple backups simultaneously? Power outages, network failures, ransomware encryption, and administrative errors. Build protection against correlated failures, not just individual hardware failures.

Further reading

Probability, reliability, and risk assessment

Against the Gods by Peter L. Bernstein – history of probability and risk management including discussion of independent events and probability multiplication fundamental to understanding backup reliability.

The Black Swan by Nassim Nicholas Taleb – examines rare events and correlated failures, relevant to understanding when backup independence assumptions break down catastrophically.

How to Measure Anything by Douglas W. Hubbard – practical guide to quantifying risk including calculating probability of multiple independent failures for backup strategy assessment.

Data backup, redundancy, and disaster recovery

Backup & Recovery by W. Curtis Preston – comprehensive guide to backup strategies emphasising importance of geographic distribution and truly independent backup methods preventing correlated failures.

Site Reliability Engineering by Google – discusses Google’s approach to redundancy and reliability including probability calculations justifying multiple geographically distributed backups (available free online).

The Phoenix Project by Gene Kim, Kevin Behr, and George Spafford – IT novel illustrating backup failures and recovery strategies, showing practical consequences of inadequate redundancy.

Hardware failure and independence

Engineering a Safer World by Nancy G. Leveson – systems safety approach examining how independence assumptions fail, relevant to backup strategies where common mode failures undermine redundancy (available free online).

Reliability Engineering by Elsayed A. Elsayed – technical treatment of reliability including discussion of independent failures versus correlated failures and calculating system reliability from component failure rates.

The Logical Leap by David Harriman – includes discussion of independent versus dependent events relevant to understanding when probability multiplication applies to backup strategies.

About the image

This is one of my old hard drives. I have a drawer full of them.

Photo montage and photo by Matt Ballantine, 2026