The Making of a Software Wind-Tunnel (part 1)
The Challenges of Applying Dimensional Analysis to Software Testing
Dimensional analysis is a method of scaling model relationships in physics. The idea is to understand how physical rules change in different circumstances. Dimensions refer to measurements (think of the dimensions of your screen rather than travelling to other dimensions in science fiction). Analysis refers to predictive models that describe behaviour.
Dimensional analysis and "scaling" have been applied to simple physical systems for over a century. The question is, could they also apply to software? Let's consider this. Examples of the technique include predicting flows of fluids and colloids through pipes, aerodynamic flows, efficiency of explosions and power transfer systems, to mention a few examples. One uses dimensional analysis to build scale models, like buildings under static loads, adding windshear, or to model the lift and turbulence around an aircraft -- such as in a wind-tunnel. One of the classic uses of scaling theory was the design of the atomic bomb: how to predict the expected yield without actually having to explode it too many times. The benefit is clear: it is very expensive and risky to build a full scale prototype, if you can learn the basics from a scale model.
In software, there is a similar potential to model software systems at a small scale in order to predict how they will behave on larger one (or vice versa). I wrote about the basic concepts of this in my book In Search of Certainty. To my knowledge, however, this has never been attempted in computing -- perhaps because it requires a level of cross-disciplinary thinking that is normally eschewed. This is a great pity, because we are witnessing an unprecedented growth in the required scale of software services that we are being forced to build these prototypes by demand. We are in genuine need of a physics of systems that can bring predictability to a difficult issue.
In a series of essays, I would like to ask: how can this be done?
Software is generally more complicated than the physical systems above. There are many things that are special about it. The most important is perhaps that it is conceived on a very low level (microscopic scale), where one finds many interacting scales. The semantics (i.e. behavioural patterns) at a fixed low level play a big role in determining the behaviour, and the large scale dynamics (or bulk properties) are generally an after-thought. Indeed, we use computers in the first place because we can make them dance in particular ways in captivity. But how do they behave when we release them to grow in the wild?
From physics, we understand that---as we increase the size and number of things in a system---the semantics or behaviours underneath mean less and less to the behaviour of the whole. So the properties of silcate molecules is not the same as the strength of a pane of glass. And a single protein molecule that unlocks a receptor for sustaining life might be jagged and prickly in just the right way to do a job, but a whole test-tube full of these is just a gooey liquid that does not open locks the size of a test-tube.
Now, a test-tube of the same goo could unlock a test-tube full of receptors at a small scale, provided they can be delivered and brought into contact efficiently. So, interfaces and contact surfaces also matter to the semantics of systems. This is what one would call boundary conditions in natural science. Similarly, a web-server farm can serve a large number of web clients as long as the two can be brought into contact with one another efficiently. Do they meet at the edge? Do they mix and penetrate?
Because we tend to abstract away infrastructure in software development ("dev"), these are the details that are rarely modelled by developers, and are either left to architects or operations ("ops") to figure out.
So there are properties at multiple scales, and new phenomena too. Properties like resilience and security may be considered emergent or systemic properties of a complete system at a given scale, just as hardness and plasticity are bulk properties of materials that have only a loose relationship to the component atoms.
When low level parts are weakly coupled (weak dependencies), large scale systems tend to exhibit stable and predictable behaviours. When they are strongly coupled, we see turbulence and chaos.
As an example, distance travelled is proportional to speed and time travelled.
So if we double the time, at constant speed, we expect to double the distance travelled. This is a very simple truth when a system is "scale invariant". But now suppose there is a hidden assumption, something that pins the system to a particular scale of distance, so that this relationship is only true in a limited sense. For instance, we put the traveller inside a box of length L. Now distance travelled is proportional to time and speed, until the traveller hits the end of the box. The scale is pinned, or the invariance is broken.
This breakdown of invariance happens in nearly all physical systems with containers of finite size somewhere within them. The result is that behaviour becomes a study of these fixed boundary scales. In computing, content delivery systems are very like fluid-flow, for example, where flow rate depends on channel capacity (coll. "bandwidth") and data storage or caching are like intermediate pooling of fluid in a reservoir. What happens when a transmission pipe, or a cache or reservoir is full?
Hopefully, you get the idea. There are many simple causal relationships that can be scaled. There are also many obstacles that break scale invariance. One of the important ones is the size of a user---or the flow sink (how much of a pipeline he/she can drink). The user has to be considered part of the system if you want to understand its total behaviour.
These are the kinds of relations that come into play for predicting scale, and so long as we understand all the dimensions of the system, we can make predictions about the scale.
Analytical human-computer methods
So if we are going to be able to predict scalability of computer systems, without actual life-size testing, how would we go about it?
- Instead of 100,000 servers, we might want to use 100.
- Instead of 1,000,000 users, we might want to use 10.
- Instead of 32 cores on a CPU, we might only have 4.
How can we get away with these limitations, and still predict the right answer for the larger scales?
We recognize that the answer has to be made up of two parts:
- Basic laws of behaviour across the relevant scales.
- Little's law etc (queueing)
- Algorithmic complexity - O(N) for processing rate
- Amdahl's law for parallelism
- Percolation scaling in networks
- Kirchoff's flow conservaton laws
- Cost of service laws
- Boundary limitations that pin the scales.
- Maximum queue length.
- Limited size of memory, CPUs
- Data transfer rates.
- The speed of light
Knowing this information is where the challenge begins. To make matters worse, many of the quantities we pretend are deterministic, are in fact stochastic processes. Nevertheless, in statistical physics we have a fair history of dealing with such issues (see In Search of Certainty).
In the subsequent essays in this series, I hope to explain how to go about this, and utlimately discuss how every scalable software pipeline can instrument itself to predict its own limited scalability.
Thu 10 Apr 13:13:18 CEST 2014