In Search of Certainty
The science of our information infrastructure

Second Edition, with foreword by Adrian Cockcroft

"An instant classic in computer science! 'In Search of Certainty' is a brilliant piece of work by one of the most brilliant people I've ever met. Complex systems, like modern IT services, need to be understood from a perspective very different from traditional IT practice. The answers are rooted in science and Mark Burgess exposes this science like nobody else."
-- Glenn O'Donnell, Principal Analyst, Forrester Research

Ruling the Machines that Rule the World?

Our planet's information systems have now reached a level of scale and complexity at which we can no longer simply decide how they will behave. They are so sophisticated and so interconnected that humans can neither steer nor comprehend them with certainty. Can we trust such an infrastructure to society?

For more than twenty years, Mark Burgess has been one of the pioneers of the science and technology behind the operation of this information infrastructure. In this book, he explains how far we have come in our understanding of the systems, and whether we yet have the necessary knowledge to prevent them from spiralling out of control.

In Search of Certainty takes the reader on a fascinating journey, from the beginnings of scientific thought to our present day, illuminating information technology as an integral part of our modern historical and cultural narrative. It lays out key challenges for the future and suggests a daring new way to think about the future governance of the vast cybernetic organism we are in process of creating.

"An incredible journey by one of the [IT] industry's most important thinkers over the past 20 years. Like everything else he's done, this is unique and astonishing in its implications."
--Carolyn Rowland, NIST

"Mark brings together the digital microcosm and macrocosm, the mundane and the profound, the human and the technological, in a way that is important, wonderful, and truly mind-stretching."
-- Jeff Sussna, Ingineering.IT

"Mark Burgess practically invented modern IT infrastructure management software. Now he has produced a revolutionary work, part personal journey, part theoretical review, as he advances the state of infrastructure science -- and our comprehension -- again. IN SEARCH OF CERTAINTY is a must-read book from a true visionary."
--Christopher Little, BMC Software

"There are thought leaders, and then there are thought leaders. Mark Burgess is a scientist who can talk to the real world, and has been challenging it for 20 years, with the message of science."
-- Reynold Jabbour, J.P. Morgan-Chase

"Holy cow! ... Mark Burgess' pioneering work in the late-1990s presaged how large scale systems were designed and operated, and it has taken the world nearly two decades to catch up with him. Ignore the design principles and patterns described in this book at your peril -- in two decades, I'm sure that it will be embedded in how every architect, developers and operations professional talks about our craft, for practitioners, suppliers and researchers alike."
-- Gene Kim, Author of Phoenix Project: A Novel About IT, DevOps, and Helping Your Business Win

"The Burgess book has become a favorite."
-- Kevin Behr, Author of Phoenix Project: A Novel About IT, DevOps, and Helping Your Business Win

"To err is human, to explain is Mark Burgess."
-- Patrick Debois

"I only got through the Introduction and Chapter 1. I was so encouraged by just those that I started applying it to organization at Joyent and forgot to come back to the book... Probably the most intellectually stimulating book of the past decade."
-- Ben Rockwood, Joyent

"A philosophy of informatics obviously contrasting with but also complementing Floridi's philosophy of information."
-- Jan Bergstra, Professor University of Amsterdam

"What I liked most about the book was the vast number of topics it drew on, there are examples from a very broad array of domains. This made it very fun. ... It really is a tour de force of most interesting things that have happened for the past 500 years..."
-- Sigurd Teigen, CFEngine

"The book is in parts a very personal description of the world we live in, and how it evolved... the book is about a journey, a personal one. I did like that part very much."
-- Sven van der Meer, Ericsson

"Should be required reading by every CTO, CIO, and datacenter architect ... This book will change the way the IT industry thinks."
-- Paul Borril, Replicus

"...the must read on the subject [pf Complex adaptive systems]"
-- Mike Dvorkin, Cisco

"A great book"
-- Joshua McKenty, Piston/Openstack

Errata and clarifications

  • Richard Seymour was kind enough to point out some clumsy wording about the definition of work-hardening and annealing in materials, and taught me something I didn't know about dislocations. He wrote to me: "In general work hardening happens by increasing the number of dislocations (imperfections, so this goes from simple line dislocations all the way up to grain boundaries) ... Annealing generally reduces internal stresses by allowing dislocations to essentially self correct.". In the book, I implicitly used the term defect more in the role that grain boundaries play, which can halt the propagation of cracks by blunting the tip of the stress concentration (also in fibreglass). The remarks in the book are quite brief, but my use of terminology is slapdash. Thanks for Richard for the message. We agree that the essence of what I meant is correct.

Tweet summaries

  1. 2 aspects govern IT infrastructure: dynamics (performance) and semantics (intent) - both can be unstable
  2. The physics of scales is about information and it applies to software. (21.08.13)
  3. The illusion of being in control depends on what details we choose to disregard.
  4. The more strongly things are coupled together, the more unreliable it becomes to predict outcomes.
  5. Weak coupling of parts allows separation of scales: dynamic: signal/noise, semantic: sep of concerns
  6. Important scaling laws are goverened by dimensionless ratios. Dimensional analysis is th.
  7. You can't obtain sufficient information unless you are looking on the right scale.
  8. You can't affect a system unless you interact with it at the right scale.
  9. Dynamical similarity is when systems that have the same dynamical proportions behave similarly. Hence scale models.
  10. Dynamical similarity applies to changing processes too, hence wind tunnels and wave machines.
  11. External constraints/boundary conditions play an important role in the stability of a dynamical changing system.
  12. Instability can be both dynamic (performance) or semantic (intent/logical) in nature
  13. Software engineering needs to come together with operations to unify models with semantics and dynamics. #devops
  14. Dev think mostly about semantics. Ops think mostly about dynamics. DevOps = complete picture.
  15. "Equilibrium" replaces determinism as the most important idea in science. It is the definition of dynamical stability.
  16. System designers often push the cost of maintaining dynamical equilipium into posthoc/reactive "operations" repairs.
  17. The maintenance theorem says: you can't really control anything over time. Best you can do is to keep it roughly in balance.
  18. Detailed balance is how the technique of "error correction" stabilizes semantics on top of a flawed dynamical process.
  19. Scale limits how specific semantics/meanings can be interpreted, (think of Nyquist's theorem).
  20. The concept of energy in physics is like that of money in economics - a bookkeeping parameter for change.
  21. Complexity is THE source of randomness. Predicate LOGIC cuts complexity, hence randomness over a knife edge. Pick a card any card..
  22. Instabilities of detailed balance (like queues) are highly non-linear two-state phenomena, you are either in or out of control.
  23. The "detailed balance" principle is about the competition of opposing influences to reach a stable state.
  24. Competition can also be about semantics: conflict of interest. The Nash equilibrium from game theory describes this.
  25. Relative change can easily become unstable. Absolute change is more certain, when it can be anchored to a fixed point.
  26. We can maintain the state of systems with absolute monotonic repairs. This is called convergence.
  27. Convergence goes beyond idempotence. Doing something once only is not enough. It has to end in the correct state.
  28. Absolute intent can be modelled as a mathematical fixed point, like the number zero.
  29. Idempotence is often mistaken for the more interesting mathematical fixed point.
    e.g. mkdir desired
    and mkdir /specific/desired
    Both are idempotent, but only the latter is a fixed point.
  30. Fixed points allow self-repairing equilibria, and get as close to a dynamic definition of determinism as we can get in IT.
  31. Certainty is a point of view of an observer. Nothing in reality assures it.
  32. Approximation is the key to putting limits on what we are willing to believe.
  33. (none)
  34. "It was almost as if the computer's response to individual behaviour was doing the exact opposite of what one would expect"
  35. Separation of slow and fast variables allows us to distinguish trend from fluctuation (dynamics) or signal (semantics) from noise
  36. Stability comes from bodies of similar outcomes: we call this averaging when applied to data and sometimes redundancy.
  37. Meaning is associated with signals that stand out (low instrinsic information), noise with too much information.
  38. It's easier to attach meaning to focused behaviour (low information) => meaning/semantics are the inverse of information.
  39. Models that deal solely with semantics (politics, philos+) are entirely fictitious until shown to be realizable in actual dynamics.
  40. When George Boole created Boolean logic he set true=1, 0=false and all in between was allowed - unifying probability with reasoning
  41. Redundancy is a strategy for increasing certainty with numbers.
  42. Statistics are a strategy of improving number-certainty through redundancy.
  43. Plot a histogram to gauge your level of certainty about repeated observations.
  44. Shannon/Feynman: informational entropy measures how easy it is to find the right path amongst many possibilities.
  45. Prigogine: instability in systems destroys the possiblity of finding a statistical description that doesn't lie
  46. Shannon => if a message contains redundancy we may compress it as symbols are idempotent when they obey the zero property
  47. The existence of "many worlds" or "possible outcomes" leads to exponential complexity in systems with decisions and strong coupling
  48. Dynamical inevitability is the best friend of certainty. Reasoning is an artificial narrative based on unstable true/false choices
  49. G\"odel showed that even first order logic cannot lead to complete certainty, not even within the limits of its own axioms.
  50. Causal ordering can emerge from simple repetition if you engineer systems based on recognizing preconditions
  51. "Uncertainty seems to reside not only in the incompleteness of information, but also in the instabilities of reasoning."
  52. "Automation is encoding fixed system semantics within an entirely dynamical framework."
  53. "The future lies in embracing a tradition that has long held sway in natural science ... approximation"
  54. "The paradox of certainty is that the very controls we add to mainly in instability's favour to undermine that control"
  55. "Control by obligation is not relativity friendly. It quickly becomes inconsistent without global knowledge."
  56. "Promises emerged out of the failure of deterministic logic to describe distributed systems."
  57. "The order in which we consider self-contained intentions does not alter the final outcome of those intentions."
  58. Autonomy leads to a local view of causation, a viewpoint that does not violate relativity, or "CAP".
  59. Any autonomous agent is a single point of calibration, an arbiter of uncertainty from multiple observations
  60. The most certain state is one of complete ignorance, as all observations add uncertainty about the state of things
  61. A promise is a declaration of intent, within a certain scope (semantics). Keeping it involves dynamical equilibrium.
  62. No agent can make a promise on behalf of any other than itself. This is the meaning of "voluntary cooperation"
  63. A promise proposal is not yet promised - like a testament/will that hasn't yet been been signed.
  64. A contract is a set of bilateral promise proposals that is activated when agents promise to use them collectively
  65. "Studies made by psychologists indicate that humans generally over-estimate their importance in ensuring a successful outcome.."
  66. "Ethics is not a topic w/ infrastructure but when it comes to tools of society one cannot ignore human involvement"
  67. Orchestration is about the design collaborative behaviour. Narration is the sequencing of ordered steps.
  68. If systemic behaviour itself is stable, we can rely on infrastructure to persist, and function as a platform for society.
  69. Dunbar pointed out that our brains can only cope with a limited number of relationships
  70. Dunar discovered that the closer a relationship we form to something/one, the fewer we can sustain.
  71. How close a relationship do we need to be able to truly understand our systems?
  72. Cooperation and collaboration are give-and-take relationships. Which relationships between people/technology are counter-productive?
  73. Alvin Toffler's 3 waves: manual labour subsistence, industrial monoculture, and informational diversity. Model based for the 3rd wave.
  74. The goal of unique semantics but redundant dynamics seem to be in conflict: uniqueness vs redundancy.
  75. We can reconcile semantics/dynamics by looking at emergent behaviour: attractors rather than rigid control.
  76. Promise Theory provides the semantic measuring stick against which we can measure systemic behaviour.
  77. There is no system design without being able to think in terms of applying constraints around non-deterministic behaviour
  78. A dynamical average leads to a stable observable value - found through redundancy of statistics
  79. A semantic average leads to a stable meaning - found through redundant cooperative annealing.