What is the value of System Administration to Business?

Based on the Keynote given at the NLUUG Spring Conference on 6th May 2010.

Good morning. I would like to talk about three things in this talk:

The SA profession and its development
Business development
Knowledge Management

and I would like to argue that they belong together.

About five years ago, when I was reviewing the whole design of Cfengine (and other automation systems), I became interested in the problem of how to align computer systems to the specific purpose of a business, to make the IT system support business needs better.

That could sound like an odd thing to say -- after all, isn't the whole point of system admin to support a business or organization? We might think this a little more today, but I still remember the nineties when it was enough to feel proud that you had actually made some new thing work at all, and though the technologies have improved a lot since the 90s, system administration practices do not seem to have changed very much. And for inexperienced system admins, I hear that the same is still largely true today.

I got interested in this problem because I've seen a lot of developments in the technologies for using and supporting computers over the years, but I have not seen any big changes in the way that system administration is done. Moreover, while I've spent a lot of time trying to give SA an academic legitimacy, but this kind of recognition has only reached a tiny few SAs in more forward looking organizations. In terms of professional development, SAs are more likely to develop professionally and academically by serving business well.

A drain on the organization

Many, if not most, business leaders see IT services as a cost centre, a sump for funds to disappear into. They tend not to see it as a strategic tool for supporting business growth or developing new opportunities. It's like the rent you pay on your building -- you don't often think of the building utilities as opportunities for changing your business. But the analogy is poor, because System Administrators are not unskilled pipes and floor space, but often rather knowledgeable engineers.

The problem is an interesting one, I think, because there are many diverse expectations and requirements that come from businesses, and business leaders don't always know enough about technology to be able ask for what they need.

What has been missing for SA is a simple model of how SA works in an organization, with a minimum of subjectivity. Instead of giving up on science, as far too many people do, (often saying that science doesn't work in the real world), we have to look more deeply at the problem.

Consider the following communication problem: a business executive says needs help in knowing how to ask questions: instead of saying -- I really need an email system that lets me do this, they might say -- we must install Exchange or product XYZ. An uncooperative system administrator then says: it's impossible to support that function in this environment because it is fundamentally insecure. Now there is some kind of rift between technical services and perceived need so that neither part gets what they want.

This shows is a lack of something very important in business, which has a simple explanation in science: a trusted partner relationship. With a lack of trust comes low status.

Aligning with business

Let's step back a little bit, and think about the relationship between IT and business.

HP research originally asked the question only in 2005 -- can we understand better ways of managing systems to make them align better with business goals. This sparked a number of conferences called BDIM, or Business Driven IT Management, which proposed various ideas about improving efficiency in a web based E-commerce kind of environment. Nearly all of them were about queueing theory, or ITIL/COBIT best practices. Then Jacques Sauve, a Basilian professor and friend of mine, pointed out that the papers were not really talking about business in general.

I wanted to start with a better picture of the state of the art, so I hosted two workshops at USENIX/LISA, to hear ideas from system administrators themselves. The conclusions were pretty clear, and I've written about them with Carolyn Rowland in ;login: . The main high level conclusions were:

Need to improve communication between SA and management (an us and them gap).
Sys-admins are not usually included in decision making processes, but are sometimes asked for advice on purchasing.
SA needs upward visibility. etc etc

This is not an unexpected result, but it is a little annoying. This is not a technical answer, but a human answer -- not something familiar to SAs.

These are all human management issues, nothing technical is mentioned. Again it is about the lack of a trusted relationship. At this point, some people in the research community gave up and went back to work on technology -- but this reminded me of another part of my research over the years that has to do with optimization of economic strategy, and a very famous set of results that has been applied to everything from marriage counseling to nuclear stockpiling during the Cold War.

Trust in the mission

Let's just think amoment about what leads to trust. Reliability and dependability are important criteria for trust. In the early 1990s I was a proponent of policy based or desired-state management, advocating the importance of maintaining a system's continued operation. I started a business to learn more about how businesses really work.

After a number of years I generalized this model so that it could deal with a wider variety of issues, and began to talk about promises. Trust and reliability come from keeping promises. Today we talk about compliance, often with external regulatory requirements, but increasingly with carefully formulated internal criteria.

Formulating a clear policy as a guiding star for your mission goals is a simple form of Knowledge Management. It's about documenting an essential outcome of whatever processes we have.

I have advocated this goal or promise-based approach for years because I've seen that most organizations document their procedures rather than their goals.

Think of SA as delivering us to a destination -- the desired state. Like a passenger jet navigating towards a destination, instead of looking for a beacon to guide them, many organizations only have manuals for steering the plane, changing course, adjusting the height. It is hard to answer the question: "Have we kept our promise?" if we never formulate one: "Are we there yet?" -- "Well, we're somewhere, that's for sure!"

Navigation is Knowledge Management, and it brings clarity.

Using Cfengine as a framework, the main idea that I came up with for dependability was based on the principles of maintenance in unpredictable environments -- turning a difficult concept like a goal into the idea of a state -- now a desired state, or promised state.

Stuff happens that is beyond our control. We spill things, the weather strikes, etc and through no fault of our own systems degrade. To maintain a good state, we therefore have to perform preventative and reparative maintenance. Ten years after the paper I wrote, I am still seeing the basic truth of this work cropping up and not just in technical procedures.

But the idea of a maintenance process -- a continuous dialogue of interaction and adjustment -- is more important than just thinking about technical specifications. It is also the source of perceived economic value in the world, and we want to talk about value. So trust in outcome comes from the assurance that we are continuously striving to maintain course towards our goals.

Goals

The analogy of a business as a commercial air travel is useful for pointing out some flaws in SA methods. What you want in a reliable aircraft is not very much risk or interruption of service: course plotting, redesigning, taking off and landing, a lot of flying in a straight line, and a little bit of adaptation to unexpected weather conditions.

You don't want a rocket ship where you invest years of planning in a vast structure that is 90% overhead to deliver a small change payload and which blasts off once with a high risk. What you want is a 747, a reusable, regularly scheduled service that can adapt in real-time with a minimum of wreckage and down time, and which has developed repeatable and well-known reliability and safety procedures.

In other words, you don't want to spend a lot of time on risky changes and reinventions, starting from scratch, you basically want it to hum along accomplishing your goals. For this to work:

You must have goals.
You need to be able to plot a course from where ever you are to your goal.
You need to maintain the course, given external perturbations like the weather.

It is clearly a valuable thing to arrive at the desired destination, but how do we measure our success in doing so? Did the journey take a lifetime, 80 days, five minutes? How much fuel was expended? How many lives were lost along the way? There are many possible criteria and tradeoffs one might make. What was the value of that service to a passenger?

Can we measure it?
The value of knowing that the outcome will happen.
The value of knowing what the cost will be.

The trouble with the way system administration is done today is that it is project oriented. It moves from risky space-launch to risky space-launch, spending a lot of time planning and preparing, with massive overhead, to deliver a small payload of change to an environment. This is costly and dangerous. To be commercially valuable, rockets need to be replaced by reliable and reusable 747s that can turn around missions on a regular schedule with a smooth and comfortable ride.

Customer satisfaction?

"Customer satisfaction is our goal!" This sounds very business-like and proper. Shouldn't we measure value by gauging customer satisfaction? This idea has some merit, but who is the customer? A system administrator typically has two sets of customers:

Business leaders who pay bills, measure performance and fire people.
End users who benefit in a more hands-on fashion from the work done by the SA.

Who are SAs trying to please? In many cases these two groups are somewhat disconnected, so pleasing one is not going to please the other necessarily. End users might be satisfied with functional capabilites, but business leaders are more satisfied by strategic capabilities (getting things done on time, and in budget, adapting and providing the tools to support operations). Business leaders want 747s, not rocket launches.

Combining these two goals requires the SA to explain user issues to the business leaders. It requires Knowledge Management.

Business

What makes a successful business in the view of business leaders? This is a hard question to answer, but many business experts would agree that the ability to succeed in business is less about capabilities and more about:

The perception of confidence in business services.
Ability to turn around ideas quickly (agility): Time To Market.
The unique business value of what is being sold (enablement).

These issues are quite different from the technical challenges presented to system administrators. Looking at business "deliverance" books, you will find that these themes feature prominently in measuring success of the business. For business executives to value System Administration according to their own set of values, SA has to make an impact on their terms. So to be perceived as contributing to business, SAs need to relate technical work to business achievements.

What does value mean?

What is value? It is a subjective judgement about the expectation of future return. The value or utility of the "something of value" is what it might be traded for (something real) later down the line, e.g. potential, revenues, cost savings, reputation, etc etc.

Is it a metric? Can we measure it? Any subjective judgement can usually be turned into something more objective by limiting scope and standardizing assumptions, but we need to do this in a way that doesn't result in nonsense.

Let's try:

How might we measure the value of system administration? What kind of things represent the real value of the task, the difficulty of the job and the success of the IT system? What is valued: skill, promise-keeping, delivering on time, trusted advisor. Stability of operation is one principle that brings confidence and assurances. Trusted partner for advice?

We need some assumptions, so let's assume that the task of SA is to manage machines and services, and that there is some kind of predictable framework, a policy and a system of maintenance.

We can classify measurements into extensive and intensive variables. Extensive depend on the number of machines or "managed objects". Here are some examples of measures that cannot be misunderstood:

Quality of the System

Type	Measure	Interpretation	Range
Scale	Number of machines	A bulk factor	[0,infty]
Scale	Mean time to maintenance	The average rate a which changes can be effected	[0,infty] secs
Scale	Maintenance interval	The frequency of evaluation (Axelrod)	[0,infty] secs
Entropy	Variability of policy	Scope and complexity of goals	[0,1]
Ratio	Fraction of promises kept	Goal alignment	[0,100%]
Count	Number of promises made	Goal alignment	[0,infty]
Number	Average maintenance queue length	How successful are repairs/policy	[0,infty]
Number	1/(1+AQL)	Stability of policy, average outstanding work	[0,infty]

Quality of procedures

Type Measure Interpretation Range

Value Service performance disruption causes by repair Cumulative service degradation [0,infty]
etc, many such measures.

Type	Measure	Interpretation	Range
Value	Service performance disruption causes by repair	Cumulative service degradation	[0,infty]

Difficulty/Skill of the Job

Type	Measure	Interpretation	Range
Ratio	No. of systems/No. of admins	Maintenance efficiency	[0,infty]
Ratio	No. of users/No of help-desk hours	Help efficiency	[0,infty]
Number	K = Size of system knowledge graph	Scope of the intellectual challenge/skill	[1,infty]
Ratio	No. of knowledge associations/K(K-1)	Cognitive complexity of system description	[0,1]

Impact on Business

Type	Measure	Interpretation	Range
Centrality	Eigenvector centrality of knowledge graph	Relevance of items to local concerns	[0,1]
Percentage	Uptime/reliability	SLA availability	[0,100%]
Percentage	Utilization per host	Resource efficiency/cost	[0,100%]
Number	Total revenue-bringing transactions	Income generated directly	[0,infty]

Cfengine users, for instance, can switch on value tracking and sum up the value of promises kept, not kept, repairs made etc. It's a simple idea.

These measurements above are intrinsic values that do not depend on what tools one is using. If you are an engineer, or physicist, they are just classic scales and ratios familiar from dimensional analysis. Some of the business impact measurements are obvious. The principal usefulness of measurements like these is that they can be tracked over time, to see when we are improving. In other words they are useful in a model of system relativity. One could even conceivably use them to compare different systems in different organizations if they were sufficiently similar, though more work would need to be done on methodology.

So we seem to be able to write down non-subjective measurements. Is there a mapping from low level metrics to high level valuations that could inform business? Can we write:

Business value = f(a,b,c,d,.....)

where a,b,c,d are the metrics above, plus others we haven't written down? Business people like to talk in terms of dollars. That means we must have such a function that outputs dollars (or equivalent Yuan etc).

Value is a promise, not an inevitability -- Knowledge is key

If we asked the question, can we measure some notion of intrinsic value to system administration that would be meaningful to other system administrators, or for internal tracking purposes, the answer would surely be yes, but we are not so interested in self-assessment here. When we ask what is the value for business, there is a cultural chasm to cross.

People have searched for metrics about management for a long time and one can naturally define such "Key Performance Indicators". The difficulty arises when we try to translate from one set of assumptions and concepts to another. The things we can measure objectively do not play a direct role in value judgements made by businesses.

The ability of an engineer to enable "business agility", i.e. change the system quickly, for example, is a complex function of efficiency, cognitive complexity, interdependence of issues, etc. But it is also a function of organizational efficiencies between SAs, such as delegation structures, and other "management level" concerns that are beyond their control. (In one company we have worked with, such delays in logistics chains made a simple issue like an installation take months rather than minutes. So organizational overhead is a dominating factor.) A business executive will only see the end result.

Consider then how this perceived value arises from a business perspective. A valuation is typically an ad hoc mixture of rational input and gut feeling.

Value = w . metrics + W . gut-feeling
        -----------------------------
                  w + W

The weight factors (w,W) determine what mixture of rational and irrational is chosen. In particular, if the rational measures we can make do not directly reflect the perception of the result, one will plump for the gut feeling, with reasonable justification.

Whenever we can't understand the reasoning or relevance of a piece of information, we tend to discard it. This is a key reason why Knowledge Management -- the ability to comprehend -- plays a large role in value judgements, belief and decision-making.

In promise theory, we say that a valuation can be promised by some measuring instrument, but the receiver does not have to promise to use of believe it.

Promises in an evaluation:

              assert value
          --------------------> 
   THING                        EVALUATOR
          <--------------------
           believe/trust value

Trust or belief between the interacting parties is the binding glue in such a relationship, and this can be governed as much by ignorance and prejudice as informed judgement. (Different cultural groups trust each other less than members of the same group, because they speak the same language and they meet each other regularly (see below)).

So while a set of metrics might work between one peer group in an organization who all agree on the premises, another group (typically business administration or management) will not agree and will weight the metrics with a small "w".

This is the dilemma for SAs. They are not being judged by their peers, but by a separate group whose fingers are on the purse strings. How can we maximize this binding between SA and business to mutual benefit? We can create a different kind of model which is based only on the minimal set of interactions that both sides experience.

Example: Return on Investment

In my experience through Cfengine, we have learned of the divide between SA and CIO the hard way. The users of Cfengine are the SAs, but the person who would pay the bill is the CIO. If we want to convince a company of the value of Cfengine, we need to tell them about what value they will get for the price, and this value should exceed the price.

However, convincing the SA is easy -- because they use the software and we can show them that it works, the features it has etc. Convincing the CIO is a different story, because he couldn't care less how it works, he bases his judgement on a business relationship (possibly perceived only tenuously through figures and results) and the costs and figures about a summary of an entire process history. He has no insight to evaluate the parts. SA is a black box.

So the magic words ROI mean quite different things to different people.

The evolution of voluntary cooperation

The metrics we showed about seem scientific, but they are no use, because another effect (trust in the values -- a meta-value judgement) pre-multiplies their values.

So should we give up, and say it is not possible for science to tell us anything? Naturally not. The science of these arbitrary valuations comes from an unexpected source.

Robert Axelrod became widely known in the 1980s for building on the work of a number of evolutionary biologists in studying cooperation using a simple model of non-cooperative interactions, called the Prisoner's Dilemma model. The beauty of this model was that it was ridiculously simple, and yet seemed to predict many well known truths about what motivates human interactions. Apparently we are neither as intelligent nor as complicated as we would like to be when it comes down to it.

The PD model gives two players an equal chance to win, but allows the players to compete. If both players are trustworthy (keep their promises), they can both earn well in a long and stable relationship (called a Nash equilibrium). However, there is a small incentive for one to renege on its promise (win 5 instead of 4) leaving the other part with nothing. Both player do this, they both end up with little (1 instead of 4).

There is an incentive to not keep your word, but if neither party keeps their promise, they both lose.

The important thing here is not what happens on one interaction. In a single interaction, there is nothing one party can do if the other chooses not to keep its promise. The significance occurs when the two parties meet regularly and play out this scenario, for if player 1 "screws" play 2 in one round, then player 2 can retaliate and screw player 1 in the next. This is often the case leading to so-called "tit for tat" behaviour that is observed in all manner of social interactions. Eventually these tit 4 tats can settle down into a stable cooperation.

What this game shows is how stable promise-keeping relationships require the impact of the choices to be experienced repeatedly. Once is not enough.

If an SA fixes a problem or provides a service once or rarely, it is hard to set a value on this, and it gives no leverage on the counterpart. But if the impact is felt by an exchange (contract etc) on a regular basis, then the All a brand does is to offer a name that reminds us of this relationship.

In the open source world, there has also long been a discussion about the value of Free/Open Source software, that is based on the maxim: no cost => no value. We see why this arises: no impact implies no need to think about the software at all.

We are not really as intelligent and discerning as we would like to think: these universal even primitive reactions are built into any system that places values.

Tricking the social brain: Branding

One way to "imitate value" is to play on people's perceptions or fake familiarity and and loyalties by creating "brands". This sounds like we are in the dirty world of marketing, and indeed we are, but this is the same simple science.

The in-house sys-admin has no brand, so he is always going to look poorer than the hired gun from IBM or Accenture. By in-sourcing IT services and selling the back to the company, we can change this perception somewhat. It underlines the impact of what SA does, not just once when there is an emergency, but repeatedly. It is this repetition that is important, because it forms the basis for relationships.

Branding also has a simplifying effect. You replace a complicated message with a simple name, logo, or interface, usually with some trade-off. In that way you change the perception of its complexity. This is what is happening in cloud computing and service management, SLA, ITIL etc

We can now see why businesses are often more willing to pay money to hire a consultant rather than spend the same money on its internal resources: the internal SA has no brand, no identity. The consultant from IBM or Accenture borrows the trust that comes from identifying with this brand and is therefore automatically worth more in the eyes of a business.

The paradox is that, while SAs often put a premium on technical difficulty, Business levels go for simplicity. Knowledge or High Level jobs are considered to be more valuable than low level system functions because they speak a more universal language and can therefore interact more effectively in the kind of grooming exchange that generates the perception of value.

Dunbar's numbers and perception: Teams and responsibilities

A corollary of this importance placed in social interactions is another piece of science that I find astonishing, but also compelling. Our brains limit us in how many of these relationships we can maintain -- so we cannot simply increase our value arbitrarily.

Robin Dunbar showed that there is an extraordinary correlation between the brain size of animals and the size of social groups that they can maintain. The implication is that our brains limit our ability to have complex relationships. His data are based on several species, and repeated studies show that the group sizes we can deal with as humans are fairly constant.

A number of scales recurs in humans:

Close personal relationships:	5-10
Working relationships:	30-45
General acquaintance:	100-150

The more intimate a relationship, the more brain power it consumes. The fact that these limits exist means that there must be limitations on other kinds of relationships too, such as knowledge of systems and strategic issues. Exactly what the number is doesn't matter. Maintaining a relationship with business executives compared to end users is a much more intimate case that needs a bigger investment. There are some economic choices to be made here: become a man of the people, or a friend of the King?

Another (possibly related) result is the level of indirection we can maintain when thinking: if we try to multi-thread our thinking we are only able to manage about 4-5 levels if indirection without losing track of where we were. This says something about how SA teams should be organized if it is going to succeed in maintaining strong relationships between customers, users, partners, ideas, etc. Broadcasting messages by email or Facebook is not a way around these limitations.

How can SAs work around this limitation. A team identity or brand is a way to make a personality out of a special group, and packaging SA into areas can make it easier for users and business executives to understand and maintain relationships.

I think these ideas about the identity of SA need serious consideration if the status of SA is to be raised from cost-centre to strategic player. If we look at how much `mindshare' and budget ITIL has received in recent years, it has achieved more in ten years than SA has in twenty -- and ITIL is little more than some clever branding.

Where is system administration going in the next ten years?

I've talked about three things

The SA profession and how it is evaluated
Knowledge Management as a strategy for increasing perceived value
Business development as a partnership

Each year I see "System Administration" disappear in favour of "IT services" (being repackaged). SAs are very image conscious -- they don't want to be technicians, don't want to be service personnel. Some like the terms engineer, some don't. Is administrator better than manager? I think it's going to be important to cut through some of these issues. SA needs to work on its brand if it is going to advance.

Cloud computing has demonstrated the desire for business to take over the packaging of SA, repackaging it and measuring its value using a subscription relationship. This is a business innovation, not a technical one. Technologies like Grid that come from IT technical designers, on the other hand, have not been widely successful. Why not? Perhaps because they do not foster a good mutual relationship.

Packaging serves two purposes: a knowledge management function (it simplifies a number of related issues and presents them in a nice box), and it it creates an identity for that process -- a brand that we feel we know because it encapsulates some kind of history.

Personally I believe that Cloud Computing is just a stepping stone to something more interesting: Molecular Computing, in which we treat system components as modular service atoms that can be combined into actual applications -- not through web services and SOA, but at the level of self-healing automation.

Summary

What is the value of system administration to business? Trusted, dependable services that are easily understood is the answer. How can SA grow in value? By designing and establishing a balanced reward/cost relationship that lasts -- by turning rocket-ship projects into dependable 747 services.

It is possible to build a set of metrics that measure an intrinsic value for System Administration, but this value will not be easily accepted by business level executives because they have no recognition of its measurements. They need a trusted interpreter. Instead, to see how SA can be valuable from a business viewpoint we have to look at aspects that businesses value by building repetitive encounters with "clients" that amplify the impact and hence the value of the contact.

Finally, Knowledge Management is a central pillar to developing and measuring system administration value, because knowledge and integration are generally high up on the stack of things that are valued in business. Knowledge changes perceptions. System Administrators must become teachers.

homepage mark burgess

Thoughts...

Cloud minders

USENIX ;login, April 2009

Don't criticize my grammar

A personal rant

Occasional letter diaries