Laugh IT up --- the Internet is just a gas!

Why IP addresses will continue to hurt us for years to come

This piece is a continuation of Virtualizing virtualization to scale infrastructure to an Internet of Things, and SDN: software defined networking, . . . or small distributed namespaces?.

It is based on detailed examinations in the following research papers:

In an earlier post, (the announcement of my Semantic Spacetime project, for understanding smart environments including Cloud Computing and Internet of Things), I made the point that modern networking doesn't have a proper model-based understanding of space, i.e. distance, direction, location, etc. We address data, storage, computation, etc by different and incompatible names and models, and then we struggle to find resources by brute force, by applying algorithmic solutions where data-structures would be more appropriate. IP address design, for instance, underpins a fundamental lack of scalability in Internet addressing, even as we race into a future in which every cubic metre of space wants to contain addressable elements.

There are several issues to mention, from addressability to multi-tenancy. In part 2 of the work, I've turned to map out these issues, so that we might find lasting solutions to them. This short post is a summary of some of the motivation.

Why cloud and the Internet of things are not two different things

As we redefine space to be increasingly functional, actionable, and with agency, then we have to be able to locate and reach functions and services, as well as understand what they will do for us. This is the world of pervasive computing, and smart environments.

Computation and storage are obvious generic functions, from the operational infrastructure viewpoint; they are addressed by cloud computing, from a low level. However, as we ascend the application stack, the flavours of functionality become as rich as any shopping mall (see In Search of Certainty about the Smart Shopping Mall project). The low level resources are so generic, and are shrinking so fast, that centralizing them in the halls of cloud computing is likely only to be a transitionary phenomenon. Like Asimov's Trantor, our planet's biosphere will eventually by a cybor(g)sphere, with addressable content everywhere. In just a few years, we'll be able to carry all of our personal and private data with us everywhere we go inscribed on a virtual birthmark. Why would we want to expose it out there in the Cloud, again?

Moving data and electricity around the planet is a quaint but passing fad, just as visiting the petrol/gas station to fill up a tank is a temporary way of powering a vehicle. For some things it will continue to make sense. For others, we can get used to having resources nearby again. If we are going to get beyond unsustainable utility models, in which everything is centralized, to something scalable, we need to understand better what it means to promise resources for sharing. The only way to model efficient semantics is to understand the intentionality of the processes. This includes understanding the role of the workloads. Just look at the rise in importance of content caching and content delivery networks, the shadowy cousin of centralized computing that we are ashamed to discuss in public.

Cisco recently coined the term Fog Computing for this idea. I have chosen to call computer systems by a simpler name: space.

Laugh it up (fuzzball)

For all the cooky names we invent, one of the serious looming problems in IT is IP addressing -- but, the real problem is probably not the one you are thinking about.

The problem with IP addresses is not just that there aren't enough of them, but that their design and usage leads to very high in spatial entropy: different unrelated addresses are spread at random locations all over the planet. This makes location and transport informationally expensive, because location information has no pattern and is therefore largely incompressible. IP addresses do not track and measure space well. They are simply random names.

From one IP address, you can't predict where the next one is, or how far away it is. You have to wander around in space looking at routing tables to navigate the random graph structure. The only way to aggregate addresses is by prefixing. CIDR prefixing helped to alleviate the problems with the original Class A, B, C networking, but it is pitifully antidote to the growing size of routing tables.

The analysis of semantic spacetime (in paper II above) reveals how naming is important is many different contexts. A space becomes a semantic space through name and function. It is not merely passive, it becomes an extent filled with active agency. In good old-fashioned passive space, like Euclidean vector space we learn about in school, we naturally measure out the lack of content through symmetries, using metric addresses for points. The metric nature allows us to take coordinates (x,y,z) of any two points and calculate the distance between them from Pythagoras' theorem, as well as predict how to reach any location without the need for routing tables or sign-posts at every junction.

(2,1,1) is to the right of (1,1,1)
(1,2,1) is ahead of (1,1,1)
(1,1,2) is above (1,1,1)

We know these things without having to search or label every junction

The Internet is not like this. Names and addresses are only semantic labels, not metric labels. They are only identifiers, without a metric interpretation. Hence, we need sign-posts all over the Internet.

On top of that, the naming is sparse and inconsistent. IP addresses belong to network interfaces. Hostnames belong to hosts, which have many interfaces. Router-IDs look like IPv4 addresses, but they have very different semantics

Shoulda stuck a label on IT

In his book Patterns in Network Architecture, author John Day summarizes many of the maladies of IP networking from a historical perspective. It's a nice book, for some historical insight into how we got where we are, but it doesn't go far enough in pointing out its flaws (it is pre-cloud era, before the multi-homed servers and virtualization of datacentre world).

Day does suggest that some of the problems could be cured by naming. He claims that we should have named hosts rather than network interfaces. This is far from being the problem, however. The real issue is that every separate contributing agency that collaborates in the functional behaviour of the Internet needs to be individually addressable, aggregatable and separable, in a measured way.

From an Internet of Things perspective, even the network interfaces on a server are (independent and programmable) things. It is inconsistent to think of them as being `just part of a host'. You need to be able to compose and de-compose, aggregate and `dis-aggregate' Things of all kinds if you are going to model them correctly.

Indeed, this understanding of scaling is one of the key insights missing from the physics of IT infrastructure. In paper II, I describe some of it as scaling/renormalization group behaviour. Also, the analogy to material language used in my book In Search of Certainty: The Science of our Information Infrastructure, defined carefully.

The conclusion is straight forward: the lack of long range order in naming means that devices with IP addresses form a formless gas, not a predictable solid. The locations are not ordered, or fixed, but float and rearrange on a significant timescale. This is why we have to expend huge computational effort to locate every device. And that is not sustainable.

Distributed agency, coordinate systems, and addressability

In my SDN critique, I argued that IT is fixated on the idea of stacking tunnels in a kind of Tower of Babel hierarchy to work around this problem of distributed application addressing. If you start by assuming that everything is already addressable, this seems like an obvious workaround to the idea of distributed agency. But it becomes a very expensive approach at scale, not least in terms of the cognitive burden it presents to engineers.

Computer Science needs to relearn the mathematics of spaces, and coordinates--- not least because this is how we are able to share space between several tenants. When you enter a hotel, there are room numbers painted on the doors, even though every person in the hotel has a name. When you enter a parking lot, there is a coordinate system painted on the spaces, even though every car has a registration number.

Why? The answer is because the painted numbers assign metric coordinates to locations, not merely semantic names. Metric coordinates have the nice property that they are countable and hence predictable. You know that 5 is farther away than 4, and that -2 is in the opposite direction. Without routing tables attaching arrows to random names and numbers, coordinates allow us to find things.

Imagine if, when you parked your car, the numbers and locations of parking spaces were random, and even changed from time to time as tenants fought over their own private numbering schemes (this is what happens in the Internet). Or imagine trying to find your car simply by its registration plate and appearance.

You would need a ball of string, or a trail of crumbs or elaborate signposts pointing to different regions at every junction (routing tables). Then, your best option would be a linear search in each local region. This is how the Internet works (see SDN: software defined networking, . . . or small distributed namespaces?).

Multi-tenancy, fighting for space

The problem of multi-tenancy adds to the addressing issue (or the sharing of common resources), because independent semantic agencies compound entropy by mixing separate disordered naming schemes, in random chunks, on top of the underlying, shared metric space. Instead of a unified end-to-end addressing, habits reusing isolated, private, non-unique, disconnected regions of common space (patched up with Network Address Translation), instead of using translation to overlay private coordinate systems on top of a common metric one.

Solving this problem is a key issue in scaling. In paper II, I show how these ideas permeate biology too, as organisms have evolved to scale up. The IT solution so far has been to work around these issues with busy-bee management approaches like BGP AS regions. But these are only part of the solution.

Our industry get itself (again and again) into confusion by slapping marketing labels onto pieces of a larger puzzle, instead of trying to solve the puzzle itself. If we make something a `business challenge', then obviously some company will pop up and solve it. Unfortunately, innovation has been stunted by a lack of people asking the hard questions. Calling the addressing problem `IPAM' as a problem area is a massive understatement of the challenges of addressability. It is not simply about how we assign addresses, but whether the entire system of addressing is suitable or not.

The purpose of the Semantic Spacetime project, described in the papers above, is to formulate the theory of scalability, to ask those hard questions.

Paper II shows how one can assign clear semantics to regions, boundaries, tenancies, and from a perspective of scalability.

MB Oslo Thu Apr 16 13:04:19 CEST 2015