# Universality and IT management

*The death this year (14 October 2010) of the Polish-French-American
mathematician Benoît Mandelbrot, best known for introducing the
idea of fractal dimension to mathematics, reminded me of issues that I
have moved away from over the last ten years, but which are every bit
as important now as they were back then. We must learn to appreciate that scale is not merely
a design issue in IT management -- there is much to be learned if we approach
modern datacentre management more scientifically. It might hold some surprises in store.*

The Mandelbrot set,
which is surely the most iconic symbol of Mandelbrot's contribution to science,
is not merely an intriguing image of immense beauty,
it symbolises an important phenomenon, frequently ignored in computer science: that of *instability*
and the critical importance of *initial* or *boundary conditions* to eventual outcomes.

Mandlebrot's work came to be associated more with the physics of
complex systems, (`Chaos Theory'), but by neatly compartmentalising
topics like this, we merely hide them from general consciousness where they
might do some good. The significance of the Mandlebrot set is that it
represents a *boundary between stability and instability* in a
dynamical system. It shows how deceptively simple problems (in this
case solving quadratic equations) can defy our intuition and lead to
unstable behaviour. To be surprised by the Mandelbrot set is to see why
software contains so many bugs. Human-Computer systems are also dynamical systems.

## Stability first

Concerning the stability of computer systems, we seem to have
learned little in the 10 years since I wrote about the primacy of stability as a management
paradigm (for a full discussion, see Analytical Network
and System Administration). With the exception of Cfengine and
IBM's autonomics, the message about dynamics did not catch on. In the
aviation industry, no one thinks twice about placing the stability of
aircraft above other design criteria. There is a simple connection
between stability and safety. For some reason, this is less obvious in
the case of computers. There is another reason, however. Computer
programs are *discrete* or digital, not bulky and statistically
robust like the systems of the natural world. Ironically, this makes them more
sensitive to instability, not less.

The principle around which I built the `immune system' and configuration software Cfengine was (and is) this:

Forget about exactly what you want, and start with what is stable and achievable. Then find the closest stable candidate to what you want, and build on that.

Some critics still rile against this viewpoint. I often hear `we
don't want notions of convergence and probability, just change the
computer to the state we want by command -- make it so!'. The field
of *deontic logic* (essentially the study of rule and obligation)
has made a career of examining the idea that one can impose
restrictions on external entities and agents. If one looks at which
this field has achieved in its fifty or so years, it amounts to very
little. (The failure of deontic logic to say anything useful about
anything is one of the motivators behind Promise Theory.)

`Just make it so!' This is a crude and naive point of view. Of
course, one can try. We can make any kind of policy and insist that it
be obeyed. Imagine passing a law that people are only allowed to stand
on one leg, and keep the left foot at least 20cm above the ground at
all times. This might satisfy someone's view of correctness, but it is
hard to enforce,
insisting on a highly *unstable* state.

## Scale and stability -- facing the unmanagable

In physics, one of the most important realizations of the past 50 years, has been the importance of scale in understanding behaviour. Not only does the world look different at different scales (from 10,000 feet, or through a microscope), but it also behaves differently. This scale-dependence explains many phenomena in the natural world, like why aluminium bends, water flows and glass shatters, to mention a couple; and we have to believe that it can explain phenomena in computer systems too. Or, to put it another way, we ignore it at our peril.

The concept of renormalization was originally used in physics and statistics to grapple with large numbers. The idea goes basically like this: suppose we start counting something of interest; when the numbers are small, we can easily see the difference between the numbers: 1 is different from 2 and 3, no doubt about that. But what about 1237821499273642299992773 and 1237821499273642299992783? What about the difference in height between two mountains next to each other, towering above us?

When numbers get big, we can't easily see the difference between them, or see the wood for the trees. The answer is to renormalize the numbers, e.g. by cutting off the mountain bases, and keeping only the top few metres. Then you can compare them more easily. Similarly, you can subtract 1237821499273642299992700 from the numbers above and instead compare 79 and 89. To compare two specs of dust, you would magnify them 1000 times, and so on. You simply change your notion of scale, like recalibrating a set of weighing scales. The you will see the relevant phenomena better. These small differences, although hard to see at one scale, can be amplified into significant differences at others scales.

This is all well and good for simple comparisons, but what is interesting about scale is how it affects relationships, as interaction is a great amplifier of effect. We might understand the behaviour of cells on a microscope slide, but who could guess that they would clump together to form elephants and human beings at a thousand times greater size? Is this a bug, a feature, or an inevitability? Renormalization or scaling behaviour is an approach we must embrace in IT management to reveal relevant behaviour from irrelevant detail.

When a system
of *anything*, e.g. computers interacts, or communicates somehow, the
relative sizes can have an effect on the outcome. When a bird lands
on the back of an elephant, it does not change the behaviour of the
elephant significantly. But when a parrot lands on the shoulder of a
person standing on one leg, it can topple the person causing them to
fall to the ground. Why? Because the initial configuration of the
person was unstable.

Lying on the ground is an attractor (actually it is about lowering the
centre of mass of the object), but of course there are many ways one
can do it. And there are many natural phenomena that share this
general feature: a stationary bicycle lying on its side, a man lying
on one side eating grapes, a penguin on its back shooting downhill,
etc. The details are unimportant; the general principle is that, at a
given scale, a perturbation will tend to cause a system to fall into a
stable state (close to the ground). This is called *universality*
-- general inevitability of outcomes at a particular system scale. It
doesn't matter what the thing is, or who made it, or even what it was
asked to do, it is going to end up in a stable state, like it or not.

We conclude from the universality of such behaviours that it does not make sense to try to walk a tightrope in a hurricane, or make humans stand on one leg in a crowded shopping mall. Nor does make sense to base IT management on the assumption that a system will not fall over, or that simple quick fixes can avoid the instabilities.

The mistake system administrators and programmers often make in computing is underestimating the inevitability of certain behaviours. No matter what you do, universal behaviours will come to dominate fundamental issues, beyond our control.

## Stability and self-repair

The computer `immune system' Cfengine was designed to encourage system architects to take advantage of natural stability, by developing convergent behaviour -- i.e. behaviour that would naturally be attracted to a desirable state. If you base tools and policies on the notion of stability, surely the desired state of the system will persist for as long as possible (making it predictable and thus usable)? Indeed, if the desired state is `lying down at the bottom of the hill' then it will also self-repair when something perturbs it by trying to push it uphill.

Traditional approaches to management fight stability -- they are hill-climbing approaches, starting from an arbitrary point and following a fragile path to an unstable point. The ability to reach the desired state from the initial state is unstable to: i) the choice of initial state, ii) the predictability of the path. Cfengine turns this approach upside down, and makes the desired state a stable point of attraction.

*Traditional system configuration starts from a single*

known baseline and climbs to another point.

Cfengine is like a valley insensitive to initial conditions.

known baseline and climbs to another point.

Cfengine is like a valley insensitive to initial conditions.

As we develop more tools for datacentres, I believe we should make greater use of this understanding. Intelligent systems in the future would have self-knowledge and understand universality of behaviours. They would be able to warn architects of poor decisions, and even limit the probability of unstable behaviour by design (like Cfengine).

Cfengine introduced `model-based' configuration, for a class of models that can be expressed as stable attractors. Not just any model, but stable models. Of course, there is also the problem of patchworks. There is no single model in use. The world's computer systems might be interconnected, but they represent a patchwork of overlapping models, with different goals. Moreover, the scale of systems is growing to the point where new phenomena might occur. We need to understand these phenomena better, and design to work around them.

The Mandelbrot
set's fractal image is a persuasive visualization of universality in
a specific area. Surely nothing designed and programmed could deliver such
infinite regularity at every possible scale. It also underlines the
importance of *boundary* (symmetry breaking) to understanding
dynamic systems. I think system administrators need to wake up to
these lessons and think less deterministically about the systems they
manage. We can't program our way out of universality, so we'd better
understand it at all scales to harness and use it to our advantage.