The Scaffolding of Knowledge

What if I don't know the answers? What if I won't know what to do? A common nightmare surely everyone has had at some time -- being immobilized before an impending crisis, helpless, unable to move or do anything about it. The nightmare is only a dream fantasy, but the reality is lived out by someone, somewhere in a workplace in every generation of change. The dark force of this immobilization? Insufficient knowledge. What if that happened in a mission critical IT system? How could we get smart fast enough?

Beyond desired state configuration management, for the Third Wave

The greatest revelation of the 20th Century was the realization that the universe is not clockwork: that we live in a world of incomplete information, where complete certainty is never an option. Knowledge of the world might be incomplete, but we have to make the best of that reality. Obviously, this is true of IT systems, so how can we make IT as reliable and predictable as possible?

Web 2.0 resulted (effectively) in a massive dilution in the density of system administration expertise around the world: computers and IT multiplied rapidly with smart minds focused on applications rather than deployment; the number of experienced system administrators did not grow at the same rate. This resulted in a knowledge deficit about operations from the human side. What to do? Three main responses have been applied to cope with this knowledge-deficit so far:

A call to arms by system administrator special interest groups to recruit and train an army of sysadmin operatives,
An end-user approach to DIY management by GUI, a kind of system administrator in a box
An attempt to redefine the roles between man and machine.

The first of these has been basically unsuccessful: new graduates want to be programmers not system engineers, for the most part; also, the advances in automation have reduced the need for large numbers of humans in datacentres. The second approach is very fragile and only works for users with simple needs (such as hobbyists and home users). It is this third way that has interested me. If (like action) knowledge cannot be scaled by industrial-age recruitment, then we must use technology to scale what is left. This requires a radical rethink in the way organizations view their IT systems.

Humans always try to cling on to past roles when times change (we are habitual creatures, if nothing else), but that never succeeds in the long run. Instead, we redraw the boundary between Man and Machine. What these three responses represent is the classic response to things that go mainstream: first, denial -- try to hold back the ocean by sheer will power; second, oversimplify a solution, trying industrial mass-production techniques, expecting users to settle for limitations that slow the crisis, without a sense of control. Finally, redesign the circuitry of organizations to rebalance knowledge and force. In this last option, there is room to use technology to optimize the human experience. This is the approach that CFEngine is working towards in IT management.

Changing the landscape of expertise

Look at the figure below. Maturity, size and expertise are implicitly interrelated.

The size of your organization is throttled by what expertise can do with the available tools. When an organization's maturity is low, expertise is often low, and one is focused too much on the details of `how' to do things than on `what' one should be doing. If business maturity drives size before in-house expertise, one tends to outsource or rent services from outside. This brings a fragility and a dependence, a lack of control in a crisis. As expertise grows, more things can be done efficiently within the trusted organization, using available tools. The company is then a knowledge-based organization and can reach directly for its goals with greater confidence. Indeed, it is the goal of every organization to climb this knowledge maturity ladder from how to why and beyond.

Tools play an important role in enabling organizations, but the basic currency of success is the knowledge to use them well. If you don't have it, you have to buy it. If you can't apply it, you can't grow.

In the industrial age, we had dumb tools, operated by marginally trained (i.e. pre-programmed) workers, in one-size-fits-all production-line processes designed by a few smart heads to imitate steam-engine machinery. The tools of today -- the ones that will last us until tomorrow -- need to be smarter and less monolithic. They need to know more about the job they have to do; they need to be adaptive and flexible for a more diverse workload and workforce, all under ever greater pressure to deliver information rich responses to a greater diversity of challenges.

With increasing pace, this evolutionary pressure is changing the landscape of jobs: re-dividing work between those skilled as implementers, and those skilled as architects. And as always, the unskilled cog-in-the-machine worker is made redundant as his or her marginal knowledge is absorbed into `smarter automation' and large cost savings are made.

How knowledge management entered configuration

Ask any teacher, knowledge involves a long term relationship between human thought, action and information. You can't teach anyone knowledge: we acquire it by thinking and rehearsing activities within some context. We effectively build a human relationship with knowledge, just as we build relationships with friends and colleagues -- or we don't, in which case knowledge is a long-lost friend. There is probably a Dunbar number for the things we can know.

Since starting CFEngine in 1993, the philosophy of IT tool making has shifted back and forth to emphasize different elements of this triumvirate. CFEngine was designed to work hands-free, for the relatively simple needs of the 1990s. It's goal was to eliminate procedural detail and make intentions about desired-state much clearer, by declaring the state rather than the journey used to arrive at it -- so that there would be a lasting documentation of intent.

Users (who were considered experts of their day) would design a stable desired state and then hand over the system and its own ministrations to get on with the work of maintaining it. For many First Wave sysadmins, this was heretical. You could take the man out of the machine, but taking the machine out of the man was an insult. System Administration had to be done by a human, because a machine could not be trusted without oversight, but acceptance grew quickly over the years of the dot com boom generation just as it did with robots in manufacturing 30 years before.

Since then, expectations have changed. The Web 2.0 explosion came, with a much more rapid growth of demand for IT services, and a whole new generation of Linux enthusiasts with relatively little experience of system administration began to build web farms. It was into this environment that Puppet and later Chef emerged, influenced by CFEngine but moving back towards a programmer mindset rather than a system-expert one. Because demand drove scale faster than system-expertise, the focus shifted from stability to getting machines installed. Maintenance of systems was rejected in favour of `retire and rebuild', fuelled by the Cloud: virtual machines were the new disposable razor of IT.

With load balancers as shields against downtime, the Web 2.0 generation took a step backwards from planned reliability, and customized operation, to a adopt a form of disposable computing plugged together from off-the-shelf packages. This could be done basically by hand, with some power tools for manual assistance. It recalled the heady days of the heroic system administrator, in a wave of nostalgia about fighting dragons with shell commands. Even today, new remote-control shells, pop up every year to perpetuate this Second Wave form of assisted labour. Manual remote-control tools provide a seductive sense of control, a kind of joystick to manipulate heavy machinery; but, just as such systems amplify human effort, they easily amplify human error too. This was literally puppetry (remote control) rather than automation.

With the arrival of Puppet, a hybrid between the CFEngine desired-state approach and this more "retro" remote-control approach, was tried, and later embellished by Opscode Chef. It attempted to combine both worlds. Visibly inspired by CFEngine 2, but also by the culture of web programming, these tools reverted from a design and hands-free approach to one based on modular packages often manipulated explicitly from a command line interface by a human.

Abstraction deficit

The problem with CFEngine 2 was that, on top of its engine, there was only a declarative `assembler language' for desired state - with about as much structure as assembler code. It was cumbersome to express abstractions, especially for inexperienced admins, and if you made a mistake in specification, you could still amplify human error as effectively as if by hand, through the encoding of policy.

But whereas CFEngine's initial approach (in versions 1 and 2) was to expose all configuration options to users as directly modelled language, Puppet introduced more abstractions to conceal and suppress many laborious details, and make the expression of policy look easier to users. This seemed especially attractive to novice users, who perhaps didn't understand all the details anyway. It was like drawing software: making it easy for anyone to start drawing something, but the result will only be art in the hands of an artist. But such ease came at the cost of either trusting that the hidden defaults were okay, or requiring users to delve into the innards of the tool to make changes.

Chef worked this approach further by providing pre-written `cookbooks' to encapsulate broader default expertise, with a command line tool that appealed to command nostalgia. Both of these tools added back "hands on interaction" with a command line dispatch tool to give a more traditional sense of control by doing.

Infrastructure as code or as documentation?

CFEngine 2 wanted to be the documentation language, not the programming language of system policy, but the original model was not really adequate for what was to happen in the 21st century. Moreover, CFEngine 2 challenged users to know what they wanted to do at a low level up front -- and with less system experts around, that was considered increasingly hard. To go beyond the `easy to get started' ideas of Puppet and Chef, required some far-reaching improvements to modelling.

CFEngine 3, redesigned, is now a more sophisticated documentation language -- it was envisaged, based on a carefully researched model (called Promises or Promise Theory). It was designed to extend the original self-healing, hands-free concept, with abstractions, detail-hiding features (with easily changeable and documented settings), and introspection, or self-analysis. To quote myself: "Once you have divided labour between man and machine, all that is left for man to do is to shepherd knowledge."

`Understanding the monster we've created' had become the most frequent complaint about scaling systems. CFEngine 3 was designed to explain why a system was configured the way it was, not just what configuration it had. Also it generated semantic documentation about the system, which connected the dots between high level goals and low level implementation details, using a suite of tools that would not mean sacrificing the efficiency of the engine itself.

The github `Design Centre' is now CFEngine 3's approach to packaging expertise analogously to Chef's cookbooks; but CFEngine 3 was designed from the start to support a much larger strategy for knowledge management: expressing clear intentions (why), documenting changes (what) with assumptions (how) and wrapping everything in searchable meta-data for modelling powerful abstractions. With such enhancements, technology itself can begin to take over some of the institutional memory and design documentation that was previously assumed to live only offline, only in the heads of expert system administrators.

What knowledge and what is knowledge for?

Knowledge is the enabler and the throttle of any organization. We need it:

To make decisions, by illuminating the possible ways forward, so humans are not groping in the dark.
To diagnose problems, by connecting related dots of information that might be out of sight and out of mind.
To understand processes well enough to be able to automate them.
To understand the organization's weaknesses, how it works and how it fails.

Knowing is about fending off uncertainty; and uncertainty is the enemy of progress. It is lack of knowledge that leads to human errors, delays and prevents us from acting. Lack of knowledge leads to recriminations when we gamble and lose. It underlies ignorance and amplifies risk.

Our ability to connect dots between cause and effect is crucial to making actual progress, but there are many barriers to overcome; some practical, some psychological. The way we think about problems is much affected by our personalities. For some, there are two phases: brainstorming of a vision or goal, and the implementation of it. Composers and mathematicians often work in this way. For others, the two processes are inextricably linked -- thinking is about feeling their way forward, groping for a solution in a space of possibility. Artists and engineers often work in this way, but these modes of thinking are more related to the personalities of individuals than to job descriptions. I designed CFEngine to allow both ways of thinking, in principle (though it is questionable whether this has been understood by users).

Tools have to be adapted to the kinds of personalities system engineers have. Some make intuitive leaps and some feel their way forward a step at a time. Without knowledge-oriented tools to alleviate pointless detail, we maintain this separation between goals or purpose-generation and work or change-generation.

High level leader: see far ahead, set the goals (Why).
Middle manager: translate up and down (with distortion). (What)
Low level sysadmin: see in short technical steps, can't see goals for commands. (How)

From a technology perspective, the middle manager serves mainly to bridge an over-specialization gap, that could be made unnecessary and handled more efficiently by learning to speak each others' languages. From a human perspective, both business leaders and sysadmins would like to have greater insight into each others' worlds to know how to use each others' skills better. This is largely what the DevOps movement is about.

Instrumenting CFEngine 3 with a knowledge strategy

CFEngine's strategy is to turn the above picture into something like this:

High level leader: see far ahead, set the goals, insight into "dev/ops". (Why and what)
High level sysadmin: relate goals to desired configuration "what" through promises. (What and why)
Hands-free automation: CFEngine universal agile agent. (How)

The high level administrators will still have different levels of experience and expertise, but knowledge-oriented tooling can bridge this gap contextually.

The key knowledge coordinates are: what, when, where, who, how, why? These questions can be applied at any level of a process, from high-flying management to the smallest nut and bolt.

Novice users

The knowledge gap for inexperienced users is captured by the following questions:

What do I need to see and do right now?
How are the syntax and semantics to express desired state?
What is the actual system state and what should I expect to see?
How do I find and diagnose faults?

The novice is trying to drill certain responses to stimuli from the system and from team players into `muscle memory'.

Experienced users

The knowledge gap for experienced users is captured by the following questions:

What do I need to see and do to plan ahead (capacity, structure, etc)?
What are the main system trends and patterns that inform planning decisions?
How are the system interrelationships?
Where do I find design schematics
How do I find and diagnose faults?

The experienced user is more self-driven, primarily reasoning and thinking ahead, using cognitive faculties.

Each person, experienced or novice, approaches the search for information from their own perspective and with their own preferences: language, commands, web point and click, touch screen app.

CFEngine's Knowledge-Oriented Features

Following this strategy, CFEngine 3 has been designed to address knowledge in the following ways, for novices and experts:

Source information:
- Always document a clear description of intended state (promises)
- Always give back a clear description of actual state and estimate of uncertainty (vital signs, compliance etc).
- Commentary and meta-data around the above, to annotate relationships
- Context-addressable documentation that can be linked to user need.
- Autogenerated manual pages and documentation at all levels.
Repetitive/habitual features:
- Completely regular syntax, emphasizing one pattern for everything.
- Encourage incremental change
- Allow dry-run to build trust by "looking/modelling ahead"
- Web UI, command UI.
Human and machine reasoning features:
- Clear separation of intended state into "what" and "how" affected
- Model oriented approach: promises, with a simple semantic syntax.
- Semantic index, explains context and meaning of references, not just names
- Story inference -- tell me a narrative about something and its influences
- What questions can I ask about the system?

Note that making these promises about availability of knowledge, requires CFEngine to blur the lines between configuration and monitoring. This point remains controversial with some traditionalists, but one of the lessons of knowledge management is that rigid separation of concerns, typical in computer science, can actually work against users by putting up walls that make connections difficult.

The policy language describes the intended state of all computers. It is instrumented with a lot of meta-data, using the syntactic pattern illustrated in the following picture. The semantics of the syntax elements are described by Promise Theory and thus we can infer relationships between objects and their promises from this declaration of intended state.

The actual state of the computers is monitored and measured by CFEngine's agents, and these data are put into `vital signs' monitoring, report summaries, and an overarching semantic index with inference of causative storylines.

Languages are important to knowledge

Some approaches to system management involve collecting examples of every possible configuration in a kind of zoo or repository from which one attempts to select a candidate. But collecting instances does not help us to understand them unless we can reduce them to a model, consisting of a set of talking-points.

Our brains are wired for language. Some languages are verbal, some visual, some are sequences of actions. Language is important because it unifies our cognitive skills and our primitive reflexes. A lot of what we say and do is purely instinctive. Only a small part is based on reasoning.

Training our primitive brains, or `muscle memory' allows us to cache answers and avoid the expense of recomputation with our grey matter. We transform basic skills into instinctual responses by drilling activities. We play scales on the piano, perform fire drills to artificially highlight irregular occurrences, we practice swimming and bike-riding. Whatever the expression, there is a reduction of instances to general principles.

Some attempt to circumvent understanding of language by providing completely enumerated phrase-books (menus, taxonomies, etc). This tries to capture all expressable possibilities in a neatly crystalline form, like the zoology or phylogenetic tree. This kinds of hierarchical categorization is basically a spanning tree approximation to cover a subject hierarchy. The problem with such taxonomies is that they are all artificial and can rarely be considered `natural' according to any objective criteria. Someone always comes up with a different viewpoint that makes them inflexible.

Recently knowledge technologies (especially on the web) have moved away from taxonomy to `tagging', which aligns closely with CFEngine's original class approach to categorization by `usage context'. Names like hashes or representative labels are a hierarchy flattening approach. The skill then lies in the ability to map every item into an easily locatable name. This is generally more easily solved by `search' than tracing complex routes through a hierarchical network, as it short-circuits the journey to reach the goal more quickly.

Becoming more goal oriented

How do we become goal-oriented? Think of the following example of how to implement a goal to eat better. Are the steps obvious?

Goal: I want to eat a more healthy diet.
Why? Because it will give me more energy, less dead-weight, and it will cost me less. (Economic and efficiency benefits)
What? Breakfast, lunch, snacks, activities.
How? Protein = fish,rice, vitamins, etc, greens, fruit, refined carbohydrates = false exercise => high
Short term distractions: which brand to buy, replace one meal, opportunism, save money seeing a special offer
Goal aligned thinking: strategy for assembling meals, schedule for rotation/variety, find suppliers

To achieve goal aligned thinking, you need to master the low level skills, and know what is possible, or you need a guide or a coach or a helper that can convert experience into advice on your behalf. This is a question of implicit knowledge. We frequently underestimate this. This is one reason why open online communities are so valuable to people: they contain knowledge wrapped up in the package of a human relationship. The reward of being noticed is converted into access to experience -- this is part of the third wave economy.

What is understanding? Rehumanizing IT...

We need to understand systems to build and maintain them, as well as to diagnose their problems. (Our bodies contain a sophisticated immune system that lets one in 10¹² pathogens through its defenses -- but we still need doctors to understand and diagnose our systems.) So we can't package every bit of knowledge as automation.

It is usually said that we understand something when we know "the reason" for it. This is a hopelessly inaccurate representation that leads to rather little understanding of understanding. There is no single reason for many phenomena. There is a network of causes and effects that interact to make things happen, and we can trace this back always as far as we like.

It is a kind of story with complex storylines.

This should suggest that there is no magic bullet against ignorance, but we can still develop technologies to mitigate it, to build our confidence. The technologies might be called knowledge oriented. The GPS is a great example. The Internet: search, Wikipedia, etc. The quality of information is not always easy to judge, but there is trust-building reassurance in knowing what others have said.

Before these technologies existed, humans used story-telling as their main tool to pass on knowledge. We still need this -- indeed, wizards and menus mimic a kind of story-telling in the way they present decisions.

In CFEngine's Mission Portal, we use semantic indexing to create stories on demand about different topics, the idea being that this will help humans to think about what they are working on, and to know what to think about next, or whom to ask. For example, suppose you get an error message "/path/file requires encrypt connection", then a CFEngine's knowledge orientation allows:

cf-know tell-me-about "requires encrypt"

Found something about requires encrypt in the context of cf_serverd, error_messages

 which begins with "requires encrypt" (in cf-serverd)
 which can be caused by ifencrypted (in body_constraints)
 which seems to be referred to in encrypt (in body_constraints)
 which can be a part of copy_from  (in body_constraints)
 which is a body constraint of type boolean  (in values) 
 which takes value boolean  (in values)

Moreover, see Special Topics Guide on Security - encryption

or, another case, a novice is wondering about web services:


cf-know --tell-me-about webserver

Found something about restart_webserver in the context of "class contexts"

   which begins with "restart_webserver" (in class contexts)
   which affects "/etc/init.d/apache2 restart" (in promisers),
   which affects "httpd" (in processes)
   which serves web services (in application services)
   moreover this (httpd) has direct influence on "web operations" (in any)

Also  "/etc/init.d/apache2 restart" has stakeholders: mark@example.com, webops@example.com
Also "restart_webserver" is activated by check_web_server_running in promisers
which belongs to bundle application_www in bundles.

End of story

Stories like this are simple by human standards, but offer threads of system connectivity to latch onto that use our more developed path-finding skills to explore further and crucially learn from the interaction. Moreover, the knowledge is built up and extended in a completely non-structured manner: anyone can add linkage between any topic, and many are discovered automatically from the model itself. Promise Theory provides sufficient model structure to define a "chemistry" of bonds between topics, and news atoms of information self-organize into little molecular stories like this, in a kind of primeval soup of concepts. This avoids the management issues associated with Wikis and other structured document formats

Trust

Sometimes we can go too far and trust tools too much. This is a result of personal incompetence -- we trust a machine more because we don't understand how it works. This is a problem of knowledge deficit. The message is that, even with helper tools, we are not exempt from knowledge and understanding of the activities we live by. Even when robots took over manufacturing, humans needed to know right from wrong.

As we push technology out of sight, we risk losing the ability to make changes without help, and we become dependent and fragile. Either we make systems self-sufficient or we outsource things to consultants or hosting provider. Provocatively, we demand the same level of service from simplified commodities, as from bespoke, hand-built goods. This is the challenge of commerce to provide.

How can we get there? Knowledge has to be engineered by experts who knowledge can be condensed, simplified and turned into a simpler story that engages humans. We have to rehumanize the experience.

Anything As A Service

Service orientation is the natural end state of specialization, when it reaches a simplified, optimal message. Services can be provided in-house or out.

As organizations mature, they tend to move from primitively bashing tools together, to optimizing games of strategy. If in-house expertise rises, less external help is needed and we try to maximizing the return on that knowledge. At each generation of progress, experts seem irreplaceable, until the next round of innovation makes them seem redundant. I hope the arguments above show that they are not at all redundant, but rather `re-purposed' at these cross-overs.

As technologies mature, business opportunities arise around simplifying and improving the user-experience, for these problems. Cloud services are a topical example of this at the moment. They can be seen as a repackaging of a certain set of system knowledge for the masses.

Superficially, the benefits of outsourcing infrastructure in the Cloud seem attractive if one has a deficit of knowledge internally: the ease of paying for a service seems more convenient than investing time and effort in bringing the competence in-house. But there are many hidden costs, especially just over the thought horizon. The economics of knowledge are a complex game of trust.

Over the thought-horizon

When specialization becomes commoditized, the knowledge that went into it is usually the last thing on anyone's minds. We are dazzled by the novelty and the ease of the offering, and forget about assuring its continuity. In this regard, the desire to bring benefits to the masses can be a double edged sword.

In the excitement to get started, knowledge seems like a word to be feared: a barrier rather than an enabler. But what happens if the service fails or grows to be a limitation rather than an asset? The knowledge deficit is not eliminated, just pushed over the thought-horizon.

Today, the race is on to bottle, package and brand every imaginable area of Information Technology as a service. Associated with this is automation that can make things superficially easy, even eliminate the need for a lot of human activity. Automation, alas, does not eliminate the need for expertise.

Knowledge is scaffolding that we place around our lives. Knowledge wraps our day to day activities in a protective shield that allows us to trust the world around us and achieve independence. Let's make knowledge-creating technologies, not knowledge replacing ones.

homepage mark burgess

Thoughts...

Cloud minders

USENIX ;login, April 2009

Don't criticize my grammar

A personal rant

Occasional letter diaries