The Making of a Software Wind-Tunnel (part 3)
Catastrophes and critically bounded behaviours
In the previous parts of this essay, I've proposed using dimensional analysis to make scaling predictions about software systems, just as one would for mechanical systems. This is difficult because there are many intrinsic scales in mechanisms, like software, designed for rich semantic behaviour, associated with the particular functions they exhibit. To achieve true scaling, one has to suppress these pinned scales by encapsulating them as new `atomic' (or molecular) units and then replicate them like a material. This is sometimes called scaling of a service oriented architecture.
There is a final point we can learn from scaling, which is at least as important as scalability for predicting behaviour -- that is co-called critical phenomena, or catastrophic changes.
Critical behaviours
We generally assume that functional systems will exhibit broad stability and continuity of function over their usable timescales. Occasionally, systems might weather environmental noise, but if it ever starts to dominate behaviour, this means trouble. What many people don't recognize intuitively (unless they studied physics in some depth) is that critical behaviours may also be mapped out by using dimensional analysis.
Critical behaviours in physics include things like
- Phase transitions (like ice turning to water or steam).
- Catastrophes like fractures or collapses, avalanches, or sudden collective disequilibria, stalling in flight, and so on.
We might joke about a system (or person) meltdown, but the melting of a solid is, in fact, an example of just such a qualitative systemic change from a regime of solid strong coupling to fluid weak coupling. These behaviours happen in artificial systems too. They happen in IT.
Put simply a critical phenomenon implies a qualitative change in systemic behaviour at a critical threshold of some parameter. The main difference in computing versus physics is that we always have two dimensions to deal with: dynamics and semantics. Examples of critical behaviours in computing include these familiar phenomena:
- Queueing instabilities
- Thrashing
- Traffic congestion
- Fail-overs
- Memory paging
- Activation of logging and alerting
if (parameter < threshold) { one thing } else { other thing }
Logical reasoning itself is a form of instability, as I wrote in detail in my book In Search of Certainty. And this, of course, is the very basis of computational theory. This is called a bifurcation.
All conditional behaviours are threshold behaviours with the maximum possible amplification of outcome: from true to false. Whatever this change of state triggers, it is sudden and discontinuous.
Scale pinning - dynamics
What makes wind-tunnel simulation of critical behaviours hard is that they are triggered by dimensionless ratios that break symmetries, and we have no control over the way in which symmetries are broken at the small scale. Thus, if we want to use a small device (like a PC) to simulate a server farm, the greatest challenge will be to operate within a region where none of the machine limitations (finite number of CPUs, finite amount of memory, etc) play a role.
Boundaries can be formed by hard physical containment, like the edge of a container, the filling of a cache or store, the exceeding of a limit) or they can be logical boundaries such as when a rate of change exceeds a speed limit. Suppose you have two scales in your system, like memory:
Memory used (bytes) / Cache size (bytes)or
Data rate (B/s) / Cache emptying size (B/s)
These ratios might appear scale free, since they are dimensionless. Indeed, if we are free to select any value for top and bottom of these fractions, there is no reason why the relevant system cannot be scale invariant. But that is never the case. Machine parts are finite, whether they are atoms, molecules, cells, organisms, factories, and so on. It is the finite boundaries that break the scaling symmetries.
We could turn this into a simple law:
Ultimately, all there is to understand about the world is how symmetries get broken. Here are some of my favourite things that pin scales in our current technologies:
Non-scaling | Scale-free |
---|---|
Hubs (processing foci) | Lattices (grids) | Centralization | Amorphous decentralization |
Loops/recursion | Wait-free parallelism |
Locks and mutexes | Wait-free parallelism |
Ordered memory (sequential logging) | Markov processes |
Searching | Hashing |
Determinism | Non-determinism |
Semantic pinning
Of course, we can't forget semantics. How do they scale? A database with a rigid schema, or any kind of logic that expects a fixed structure with perfect input is brittle to the unexpected because it cannot flex or adapt in relation to new requirements. Thus NoSQL, key-value databases might scale, while crystalline, tabular schemas won't.
The "big-O" scaling of algorithms, e.g. O(N), is an expression of a scaling non-similarity, due to digital discreteness in linearized computational steps or memory. The computational complexity categories P, NP, PSPACE, LINSPACE, etc, effectively describe scale pinning. Ironically "P" represents pinning, and "NP" non-pinning, which is why complex problems can still be solved by non-deterministic methods.
I'll leave the reader to think of more of these as an exercise!
Reasoning systems are fundamentally complex is the sense that they try to mimic determinism with strong coupling (order dependency). Reliable transmission, transactions, sequential dependency chains (Algorithms) that work on precise determined data of specific size (introducing multiple scales). An iteration loop is a finite size container that adds a scale, namely the length of the iteration queue.
Escaping contraptions
To immunize against scale pinning, we can try to encapsulate a rich semantic system (what I call a contraption, like those wonderful machines of Heath Robinson) in some kind of cell boundary that makes more generic promises -- promises that can be replicated by equivalent or substitutable entities. The aim is to lose scale pinning by transforming the system to a scale where its components are:
- Dynamically indistinguishable (measure).
- Semantically substitutable (function).
This is the trick by which biological tissue scales, as well as plant-life. This is also how a society scales: by making individuals unimportant, a society model brings stability of the larger faceless organism by making people replaceable. Perhaps this is why humans struggle to let go of system design in order to scale them: once a system scales, nothing of the detailed design really matters anymore. It's an inhuman state. No wonder we struggle with it.
Animals and individuals follow more of a "brain model" (or a heart model, since brain and heart are the two single points of failure in animals), where centralized brain-control pins the scalability by the finite speed by which responses can be made to messages to and from the controller. One can grow large brain-model organisms (dinosaurs were pretty big), but their size is fundamentally limited by pinning.
Individuals and other "contraptions" are more fragile than amorphous tissue, because their structural scales are integral to their functional semantics. This explains the current interest in the Service Oriented Architecture (aka "micro-service model" in large architectures today. (These are championed by Amazon and Netflix, for instance.) Internal limitations in the service contraptions end up merely as limitations on the promises made by the cells, but these can be immunized against by replication and redundancy. In physical terms, we are talking about a gas of contraptions. Perhaps this is why ants and other insects are successful machinery. As long as we don't introduce scales in the coupling, there will be no more phase transitions. Alas, the bad news is that interactions with the system always do connect a finite body to the scalable back end. So we are doomed as long as we want non-trivial semantics.
Wind-tunnel testing will be more successful on this large scale structure than on the contraption because no volume scaling predictions can be made about the internals, only serial rate predictions for throughput or flow. On the other hand, we can definitely use a knowledge of scaling relations to predict catastrophes and critical phenomena. For this, the small size of systems is actually an advantage.
With speeds and memory sizes doubling as Moore insisted, keeping track of when a system will hit a critical threshold turns about to be like trying to play tennis in an expanding court. It's only when something suddenly pauses that you get caught out.
Postscript
In these essays, I wanted to sketch out the principles of software scale testing for the future. I have not written any tools or offered any simple formulae, but the principles are not that hard. We already have some tools in the form of code analysis and performance instrumentation (see the work by JInspire for instance); also consider the machine learning (as used for instance by CFEngine since the late 1990s), etc. These are the areas we need to develop.
The irony is that our need to control systems is ultimately what pins them at a fixed scale. When new logic kicks in, we might imagine that as long as it is carefully and competently written, then tested semantically, it cannot throw a spanner in the works. We now know why this is not true. Dynamics always trump semantics. The illusion of being deterministic "linear response" only works if you are fare away from critical behaviours.