Avoiding The Needless Multiplication Of Forms

System Deployment

Dr Andrew Moss

2015-10-03

The time has come for a new series of posts, as a bit of background: these series of posts are sketches of new material that relates to one of my courses. In this case the course is the basic linux introduction, which will be getting some new material. The subject to be developed in this series is how we should build and deploy linux systems. I will be covering both desktop and server builds. The information on desktop system is partly a record of how I have finally tamed the jungle of systems that I work on, and partly a description for students who may be installing a linux system to use within the course. The information on server systems is partly a record of how the course server was built, and how it works, and it is partly a guide for students: although they will not be installing a server system within the work on the course, the information on how to do so is a useful body of knowledge to take away from the course as it may be something they need to do in future courses.
On a personal note, the course server was taken off line a couple of weeks ago for a major upgrade. Two things went wrong: the VPS provider had claimed they allow installation from a custom .ISO, which is what prompted this work originally. They did not mean they allowed the customer to upload a custom .ISO, they meant that support staff could do it through a ticket. This means I may need to find a new provider, as there are problems with running a hdd-install inside a para-virtualised system that I may not be able to overcome. The other problem is that I've been off work with some health problems for the past three weeks, and it is now too close to the start of the course to believe this will be working in time. It seems likely that this year the course will be run from an old copy inside It's Learning, and that deployment of new material on the course server will probably be delayed until a different academic year. Slippage is a bitch.

Overview

Regardless of the OS, and regardless of the machine there is an observation that is both timeless and ubiquitous: when a machine is first installed it is fast and stable. Over time it becomes less so. The computer industry has some similarities with the sale of used cars: we can make that "new car smell" that lasts until you get your purchase home and start to use it. Then things go slowly downhill.
Some people would have you believe that the reasons for this are difficult and not clearly understood. Because of this there is a human tendency to attribute agency, and assume somewhere in the process we are deliberately being screwed. I would tend towards a different explanation, that I believe most programmers would agree with. At installation time a computer system exists in a state of low-entropy. The variations between systems are due to different sets of drivers on different hardware, or different selections of features and services. The installer is simply a program that builds a known target state on the system. As a programmer I always like it when my program is trying to build a value that I know. Good times.
During most uses of a system it experiences unpredictable, unknown, changes. The system tends towards a higher state of entropy. Every piece of software that is installed, every change of driver, every update or upgrade of code creates more uncertainty about the state of the system. Eventually it reaches a state of maximum entropy - the heat-death of a computer, at which point an expert is summoned to "wipe all of that crap off and install it fresh". The cycle continues, after all thermodynamics does not take prisoners and the outcome is inevitable.

Problem statements

  1. System upgrades create more entropy than system installs. This should be read as a diagram of "non-commutativity", i.e normal use + upgrade ≠ upgrade + normal use. The circles should be seen as estimates of "valid possible states / configurations". The amount of uncertainty / entropy is indicated by the number of circles.
  2. Configuration drift destroys robustness. Robustness should be interpreted as doing the least surprising thing. Editing the configuration of a live system until it seems to work does not create the least surprise.
  3. Even the fastest human expert imposes latency on fixing a system. If I type really fast then I can mimic a very slow script. Maybe.
  4. Reliability is easiest to achieve through redundancy. Robustness requires returning to a known state. When something breaks it needs a backup. If something has become questionable it helps if the backup is not also screwed.
  5. The higher the degree of system entropy the less secure the system. If we know things then we can rule out attack surfaces. Pop quiz: which is more secure, a dynamic IP or a static one. Sensible answer: why would it make a difference, the information exchange between the dhcp server and dhcp client only needs to convey MAC addresses and IP addresses. Real world answer: programmers are lazy idiots, and both dhcp servers and clients use bash to process strings in the protocol. Unless patched and verified using DHCP opens a system to the shellshock exploit.

Solution (to be developed over series)

Automated installation and deployment. Ok, so can we unpack this a little to get an idea of why this would be a solution (all details in upcoming posts):
  1. Avoid upgrading the system, capture the target as a delta from the default install, reapply the delta to an install of the new version. This is "the other pathway" on the diagram above.
  2. Never reconfigure anything on the target. Preseed and script all configuration and installation. This allows deployment of the system into a development environment with some reassurance that the config matches the live system.
  3. A preseeded installer can rebuild a system in about 10 minutes. Not to the clean OS install, but to a fully working system with working data downloaded from source control.
  4. Configuration changes are captured in the version control for the system that builds the installer.
  5. All configuration is documented. In the longer term it seems interesting to make the target system read-only to enforce this - running in a similar manner to a live distribution. This closes all the holes - if there is no root on the target box, then no attacker can own it.