‘Towards a National Computer Grid’ – Electronic Computers, 1965

Recently I picked up this book on my travels:

20171105_113849

This is the second edition (1970) of a book originally published in 1965.

It’s a fascinating insight into the state of computing 50 years ago, and remarkably prescient.

Here’s a few highlights that piqued my interest.

Getting programs right

This is a fascinating insight into how testing and debugging worked when hardware was a bigger part of the equation. Ever used a cathode ray tube to figure out what’s wrong with your program?

In the early years the usual procedure was to run the program in slow motion, one instruction at a time, and observe what happened. Computers were provided with a number of visual indicators – rows of lights or cathode ray tubes – which enabled the contents of some of the arithmetic, control or storage registers to be inspected. This practice – sometimes known as ‘peeping’ – was soon found to be intolerably slow. It can be speeded up by inserting stop instructions at suitable points, thus enabling the operator to restrict the slow motion to selected parts of the program. Those parts that are above suspicion can be run at full speed. Even with those improvements this procedure is far too prodigal of valuable machine time …

…or waited for an electric typewriter to tell you what the error was?

… most modern installations are provided with a variety of ingenious diagnostic aids … special diagnostic programs, some of which are usually held permanently in the computer store. Their function is to provide the programmer with information that is likely to help him detect, locate, and diagnose any errors in his program. Such information is usually printed as an error message on a line printer or electric typewriter …

20171105_113744

 

 

Programming Languages Will Proliferate

The idea of a programming language then was still a relatively new one at that time. ‘High level’ programming languages had only recently come into being – COBOL was about as old then as GoLang is now!

In spite of recent attempts to design a ‘universal’ language – for example the PL/1 language – the existing bifurcation into mathematical and commercial languages, typified by ALGOL and COBOL respectively, is likely to persist for some time yet; an economic combination of sports car and delivery van is rather unlikely.

This implied that languages would proliferate for different purposes, as indeed they have.

20171105_113809

 

 

Towards a National Computer Grid

It’s interesting to hear someone edge towards the idea of the Internet in an age before DNS, TCP/IP, SNAT or indeed the idea of networking in general.

The scheme envisages an hierarchical arrangement of subscriber terminals (‘remote stations in our terminology), multiplexor devices to concentrate and sort out incoming messages and so save data transmission costs, local area computers and regional computers, all linked together by a network of communication lines of varying data carrying capacity. It is proposed to make use of some of the long distance lines already provided for the telephone network

20171105_113719

 

Moore’s Law

A logarithmic graph shows Moore’s Law, though it hadn’t been given that name yet. His 1965 paper was published in the same year as this book. Not only transistors, but magnetic core capacity was seen as growing at a similar rate.

20171105_113719

Note that it was the bit, not the byte, kilobyte or megabyte that was the unit of choice at the time.

Multi-Tenancy, 1960s Style

The idea of computer systems that could run multiple programs simultaneously was a novel one. As was BASIC.

In the ‘MIT’ system twenty programs can be ‘active’ simultaneously. … The user of course is not aware of this swapping although he may realise what it is that makes the computer work more slowly than if he had it entirely to himself.

20171105_113609

 

Computer Programming and ‘Libraries’

It was dawning on practitioners at the time that ‘constructing a computer program is like building a house’:

20171105_113504

It is clearly a great boon for the programmer to have at his disposal a collection of standard subroutines which have been thoroughly tested in advance and known to work correctly.

I wonder what the author would have made of NodeJS libraries?

20171105_113504

 

 

Computers at Work

Software had begun to ‘eat the world’ even 50 years ago. I met my wife in the early 2000s through a more modern equivalent of the ‘marriage bureau’ (yes, that was a thing, and I remember them), but it’s interesting to consider that law and medicine arguably haven’t really been revolutionised by computer technology yet (leaving aside hardware innovations).

The table below estimates that there were 70,000 computers worldwide in 1968. Microsoft was founded seven years later with the vision of ‘a computer on every desk and in every home’, which seems tame today when I have more computers on me now than pockets in my clothes.

20171105_113528

Here is a rough estimate of the global distribution of computers in the middle of 1968:

North America                        46000
United Kingdom                        3000
Western and Central Europe           11000
USSR, Eastern Europe and China        5000
Other areas                           5000
                                     _____
                           Total     70000

Author is currently working on the second edition of Docker in Practice 

Get 39% off with the code: 39miell2

 

 

Advertisements

A Complete Chef Infrastructure on Your Laptop

 

tl;dr

An automated setup of a Chef infrastructure ready to develop on.

Can be used to:

  • Develop cookbooks offline
  • Train users in Chef
  • Simulate Chef ‘search’ code (the original impetus)
  • Test cookbooks

Known to work on Mac and Linux.

Requires:

Video

Here’s a video of it running on my Mac:

 

 

Install

To run:

git clone --recursive https://github.com/ianmiell/shutit-chef-env
cd shutit-chef-env
./run.sh

and eventually you’ll be handed a terminal with this message:

********************************************************************************

You are on the host.

The chef node is chefnode1.vagrant.test

The chef workstation is chefworkstation.vagrant.test

The chef server is chefserver.vagrant.test

********************************************************************************

and you can vagrant ssh into any .one of the boxes and do your worst.

If you re-run ./run.sh you will destroy the existing machines and they will be rebuilt.

By default a vagrant snapshot is performed on completion.

Code

The code for this is here.

Questions/requests?

Ask me on twitter: @ianmiell

 

 

This is based on work in progress from the second edition of Docker in Practice 

Get 39% off with the code: 39miell2

Ten Things I Wish I’d Known Before Using Vagrant

Intro

One of the ironies of working a lot with Docker, Kubernetes, and OpenShift is that I’ve had to learn a lot about Vagrant and Virtualbox. Mostly I use it to spin up OpenShift clusters on my local machine, but I’ve had to poke into lots of corners to get various other things working too.

Here’s a list of things I wish I’d known about before I started.

1) Control CPU and Memory

If you do a vagrant init then this is buried in the comments of the resulting Vagrantfile.

[...]
 config.vm.provider "virtualbox" do |vb| 
   vb.memory = "1024" 
 end
[...]

You can also set the CPUs used by the VM:

[...]
 config.vm.provider "virtualbox" do |vb| 
  vb.memory = "1024
  vb.cpus = "2"
 end
[...]

2) Pattern for multiple machines

It’s not immediately obvious how you create multiple machines on a host in Vagrant.

You can use this as an example:

Vagrant.configure("2") do |config| 
 config.vm.define "chefserver" do |chefserver| 
  chefserver.vm.box = "centos/7" 
  chefserver.vm.hostname = "chefserver.vagrant.test" 
  chefserver.vm.provider :virtualbox do |v| 
   v.customize ["modifyvm", :id, "--memory", "1024"] 
  end 
 end
 config.vm.define "chefworkstation1" do |chefworkstation1| 
  chefworkstation1.vm.box = "ubuntu/xenial" 
  chefworkstation1.vm.hostname = "chefworkstation1.vagrant.test" 
  chefworkstation1.vm.provider :virtualbox do |v| 
   v.customize ["modifyvm", :id, "--memory", "512"] 
  end 
 end
end

Each VM is defined with a name (eg chefworksation1 above) whose characteristics can then be accessed or modified with the dot notation (chefworkstation.vm.hostname). Characteristics of the ‘provider’ (which, by the way, just means: the thing under the hood that runs the VMs – typically VirtualBox or libvirt, but could even be not-VMs like Docker).

3) Increase Swap Space

This one was a leap forward for my Vagrant usage. By adding swap space to my VMs, I could increase the memory available to them without overcommitting memory on the host.

A short-cut script is available here.

4) Landrush

This is the single most important plugin for simulating clusters. It was a game-changer for me.

I can’t say it’s been without its challenges, especially as my networking skills (like many developers’) isn’t my ‘A’ game.

One big challenge I faced (for example) was how to make Docker play nice with it (contact me if you want details).

But the work this plugin does for me is invaluable.

5) vagrant global-status

 

When I run a lot of vagrant machines I often find myself reaching for this command.

$ vagrant global-status
id      name    provider   state   directory
--------------------------------------------------------------------------------------------------------------------
df8bccb master1 virtualbox running /space/git/shutit-openshift-cluster/vagrant_run/shutit_openshift_cluster_v6lepQ
4074674 etcd2   virtualbox running /space/git/shutit-openshift-cluster/vagrant_run/shutit_openshift_cluster_v6lepQ

I also use it as the basis of a bunch of cleanup scripts (which occasionally cause me to swear as I remove important VMs by accident).

6) Get a GUI

OK, maybe I’m a dunce but it took me a good while before I realised I could get a GUI out of a vagrant up command

[...]
 config.vm.provider "virtualbox" do |vb| 
  # Display the VirtualBox GUI when booting the machine 
  vb.gui = true  
  # Customize the amount of memory on the VM: 
  vb.memory = "1024" 
 end
[...]

 

7) Persistent Storage

Adding virtual disks to Vagrant machines is a complex affair for the uninitiated.

Fortunately there’s a plugin for that.

8) Vagrant snapshot

Re-building a lot of vagrant machines with the same commands?

Cache your builds with vagrant snapshot.

9) Syncing folders with host

If you find you run out of space on your Vagrant machines, you may want to just mount folders from the host:

[...]
config.vm.synced_folder "/space", "/space", 
  owner: "imiell", 
  group: "imiell"
[...]

Be aware that file permission mapping can be important in this context, so if things don’t work, read up on uids etc..

10) Clipboard control

You can control the clipboard using the modifyvm snippet.

[...]
 config.vm.provider "virtualbox" do |vb| 
  vb.gui = true 
  vb.memory = "4096" 
  vb.customize ['modifyvm', :id, '--clipboard', 'bidirectional'] 
 end
[...]

This is based on work in progress from the second edition of Docker in Practice 

Get 39% off with the code: 39miell2

 

 

A Checklist for Docker in the Enterprise (Updated)

Overview

 

Docker is extremely popular with developers, having gone as a product from zero to pretty much everywhere in a few years.

I started tinkering with Docker four years ago, got it going in a relatively small corp (700 employees) in a relatively unregulated environment. This was great fun: we set up our own registry, installed Docker on our development servers, installed Jenkins plugins to use Docker containers in our CI pipeline, even wrote our own build tool to get over the limitations of Dockerfiles.

I now work for an organisation working in arguably the most heavily regulated industry, with over 100K employees. The IT security department itself is bigger than the entire company I used to work for.

There’s no shortage of companies offering solutions that claim to meet all the demands of an enterprise Docker platform, and I seem to spend most of my days being asked for opinions on them.

I want to outline the areas that may be important to an enterprise when considering developing a Docker infrastructure.

 

Images

Registry

You will need a registry. There’s an open source one (Distribution), but there’s numerous offerings out there to choose from if you want to pay for an enterprise one.

  • Does this registry play nice with your authentication system?
  • Does it have role-based access control (RBAC)?

Authentication and authorization is a big deal for enterprises. While a quick and cheap ‘free for all’ registry solution will do the job in development, if you have security or RBAC standards to maintain, these requirements will come to the top of your list.

  • Does it have a means of promoting images?

All images are not created equal. Some are ‘quick and dirty’ dev experiments where ‘correctness’ is not a requirement, while others are intended for bullet-proof production usage. Your organisation’s workflows may require that you distinguish between the two, and a registry can help you with this, by managing a process via separate instances, or through gates enforced by labels.

  • Does it cohere well with your other artefact stores?

You likely already have an artefact store for tar files, internal packages and the like. In an ideal world, your registry would simply be a feature within that. If that’s not an option, integration or management overhead will be a cost you should be aware of.

Image Scanning

An important one.

When images are uploaded to your registry, you have a golden opportunity to check that they conform to standards. For example, could these questions be answered:

  • Is there a shellshock version of bash on there?
  • Is there an out of date ssl library?
  • Is it based on a fundamentally insecure or unacceptable base image?
  • Are the ‘wrong’ or out of date (based on your org’s standards) development libraries, or tools being used?

Static image analysers exist and you probably want to use one.

What’s particularly important to understand here is that these scanners are not perfect, and can miss very obviously bad things that end up within images. So you have to decide whether paying for a scanner is worth the effort, and what you want to get out of the scanning process.

In particular, are you wanting to:

  • Prevent malicious actors inserting objects into your builds?
  • Enforce company-wide standards on software usage?
  • Quickly patch known and standard CVEs?

These questions should form the basis of your image scanning evaluations. As usual, you will need to consider integration costs also.

Image Building

How are images going to be built? Which build methods will be supported and/or are strategic for your organisation? How do these fit together?

Dockerfiles are the standard, but some users might want to use S2I, Docker + Chef/Puppet/Ansible or even hand-craft them.

  • Which CM tool do you want to mandate (if any)
  • Can you re-use your standard governance process for your configuration management of choice?
  • Can anyone build an image?

Real-world experience suggests that the Dockerfile approach is one that is deeply ingrained and popular with developers. The overhead of learning a more sophisticated CM tool to conform to company standards for VMs is often not one they care for. Methods like S2I or Chef/Puppet/Ansible are more generally used for convenience or code reuse. Supporting Dockerfiles will ensure that you will get fewer questions and pushback from the development community.

It is also a useful way round limitations with whatever build method you support: ‘you can always do it yourself with a Dockerfile if you want’.

Image Integrity

You need to know that the images running on your system haven’t been tampered with between building and running.

  • Have you got a means of signing images with a secure key?
  • Have you got a key store you can re-use?
  • Can that key store integrate with the products you choose?

Image integrity is still an emergent area, even after a few years. There is generally a ‘wait and see’ approach from most vendors. Docker Inc. have done some excellent work in this area, but Notary (their open source signing solution) is considered difficult to install outside of their Datacentre product, and is a significant value-add there.

Third Party Images

Vendors will arrive with Docker images expecting there to be a process of adoption.

  • Do you have a governance process already for ingesting vendor technology?

You will need to know not only whether the image is ‘safe’, but also who will be responsible for updates to the image when required?

  • Can it be re-used for other Docker images?

There are potential licensing issues here! Do you have a way to prevent images available to be re-used by other projects/teams?

  • Do you need to mandate specific environments (eg DMZs) for these to run on?
  • Will Docker be available in those environments?

For example, many network-level applications operate on a similar level to network appliances, and require access that means it must be run isolated from other containers or project work. Do you have a means of running images in these contexts?

SDLC

If you already have software development lifecycle (SDLC) processes, how does Docker fit in?

  • How will patches be handled?
  • How do you identify which images need updating?
  • How do you update them?
  • How do you tell teams to update?
  • How do you force them to update if they don’t do so in a timely way?

This is intimately related to the scanning solutions mentioned above. Integration of these items with your existing SDLC processes will likely need to be considered at some point.

Secrets

Somehow information like database passwords need to be passed into your containers. This can be done at build time (probably a bad idea), or at run time.

  • How will secrets be managed within your containers?
  • Is the use of this information audited/tracked and secure?

Again, this is an emergent area that’s still fast-changing. Integrations of existing solutions like OpenShift/Origin with Hashicorp’s Vault exist (eg see here), and core components like Docker Swarm have secrets support, while Kubernetes 1.7 beefed up its security features recently.

Base Image?

If you run Docker in an enterprise, you might want to mandate the use of a company-wide base image:

  • What should go into this base image?
  • What standard tooling should be everywhere?
  • Who is responsible for it?

Be prepared for lots of questions about this base image! Developers get very focussed on thin images (a topic dealt with at length in my book Docker in Practice).

Security and Audit

The ‘root’ problem

By default, access to the docker command (and specifically access to the Docker UNIX socket) implies privileges over the whole machine. This is explicitly called out in the Docker documentation. This is unlikely to be acceptable to most sec teams in production.

You will need to answer questions like these:

  • Who (or what) is able to run the docker command?
  • What control do you have over who runs it?
  • What control do you have over what is run?

Solutions exist for this, but they are relatively new and generally part of other larger solutions.

OpenShift, for example, has robust RBAC control, but this comes with buying into a whole platform. Container security tools like Twistlock and Aquasec offer a means of managing these, so this might be factored into consideration of those options.

Monitoring what’s running, aka ‘runtime control’

A regulated enterprise is likely to want to be able to determine what is running across its estate. What can not be accounted for?

  • How do you tell what’s running?
  • Can you match that content up to your registry/registries?
  • Have any containers changed critical files since startup?

Again, this comes with some other products that might form part of your Docker strategy, so watch out for them.

Another frequently-seen selling point in this space is anomaly detection. Security solutions offer fancy machine learning solutions that claim to ‘learn’ what a container is supposed to do, and alert you if it appears to do something out of the ordinary, like connect out to a foreign application port unrelated to the application.

While this sounds great, you need to think about how this will work operationally – you can get a lot of false positives, and these may require a lot of curation – are you equipped to handle that?

Forensics

When things go wrong people will want to know what happened. In the ‘old’ world of physicals and VMs there were a lot of safeguards in place to assist post-incident investigation. A Docker world can become one without ‘black box recorders’.

  • Can you tell who ran a container?
  • Can you tell who built a container?
  • Can you determine what a container did once it’s gone?
  • Can you determine what a container might have done once it’s gone?

In this context you might want to mandate the use of specific logging solutions, to ensure that information about system activity persists across container instantiations.

Sysdig and their Falco is another interesting and promising product in this area.

Operations

Logging

Application logging is likely to be a managed or controlled area of concern:

  • Do the containers log what’s needed for operations?
  • Do they follow standards for logging?
  • Where do they log to?

Container usage can follow very different patterns from traditional machine/VM deployments. Logging volume may increase, causing extra storage demands.

Orchestration

Containers can quickly proliferate across your estate, and this is where orchestration comes in. Do you want to mandate one?

  • Does your orchestrator of choice play nicely with other pieces of your Docker infrastructure?
  • Do you want to bet on one orchestrator, hedge with a mainstream one, or just sit it out until you have to make a decision?

Kubernetes seems to be winning the orchestration war. These days, not choosing Kubenetes (assuming you need to choose one) needs a good reason to back it up.

Operating System

Enterprise operating systems can lag behind the latest and greatest.

  • Is your standard OS capable of supporting all the latest features? For example, some orchestrators and Docker itself require kernel versions or packages that may be more recent than is supported. This can come as a nasty surprise…
  • Which version of Docker is available in your local package manager?

Docker versions sometimes have significant differences between them (1.10 was a big one), and these can take careful management to navigate through. There can also be differences between vendors’ Docker (or perhaps we should say ‘Moby‘ versions), which can be significant. RedHat’s docker binary calls out to RedHat’s registry before Docker’s, for example!

Development

Dev environments

  • Developers love having admin. Are you ready to effectively give them admin with Docker?

There are options here to give developers a VM in which to run Docker builds locally, or just the docker client, with the server running elsewhere.

  • Are their clients going to be consistent with deployment?

If they’re using docker-compose on their desktop, they might resent switching to Kubernetes pods in UAT and production!

CI/CD

Jenkins is the most popular CI tool, but there’s other alternatives popular in the enterprise, such as TeamCity.

Docker brings with it many potential plugins that developers are eager to use. Many of these are not well-written safe, or even consistent with other plugins.

  • What’s your policy around CI/CD plugins?
  • Are you ready to switch on a load of new plugins PDQ?
  • Does your process for CI cater for ephemeral Jenkins instances as well as persistent, supported ones?

 

Infrastructure

Shared Storage

Docker has in its core the use of volumes that are independent of the running containers, in which persistent data is stored.

  • Is shared storage easy to provision?

NFS servers have their limitations but are mature and generally well-supported in larger organisations.

  • Is shared storage support ready for increased demand?
  • Is there a need for shared storage to be available across deployment locations?

You might have multiple data centres and/or cloud providers. Do all these locations talk to each other? Do they need to?

Networking

Enterprises often have their own preferred Software Defined Networking (SDN) solutions, such as Nuage, or new players like Calico.

  • Do you have a prescribed SDN solution?
  • How does that interact with your chosen solutions?
  • Does SDN interaction create an overhead that will cause issues?

aPaaS

Having an aPaaS such as OpenShift or Tutum Cloud can resolve many of the above questions by centralising and making supportable the context in which Docker is run.

  • Have you considered using an aPaaS?
  • Which one answers the questions that need answering?

 

Cloud Providers

If you’re using a cloud provider such as Amazon or Google:

  •  How do you plan to deliver images and run containers on your cloud provider?
  • Do you want to tie yourself into their Docker solutions, or make your usage cloud-agnostic?

A Note on Choosing Solutions

Finally, it’s worth discussing two approaches that can be taken to solve your Docker needs. You can go all-in with a single supplier, or piece together a solution from smaller products that fulfil each (or subsets) of your requirements separately.

The benefits of going all-in with a single supplier include:

  • Single point of support
  • Less integration effort and overhead
  • Faster delivery
  • Greater commitment and focus from the supplier
  • Influence over product direction
  • Easier to manage

Single supplier solutions often demand payment per node, which can result in escalating costs that can lead to regret, or constraints on your architectures which might not be appreciated at first.

The benefits of ‘piecing together’ a solution include:

  • More flexible solutions can be delivered at different rates, depending on organisational need
  • A less monolithic approach can allow you to back out of ‘mistakes’ made in the product acquisition process
  • It can be cheaper in the long run, not least because…
  • You are not ‘locked-in’ to one supplier that can hold you to ransom in future

Conclusion

The enterprise Docker field is confusing and fast-changing. Developing a strategy that is cost-effective, safe, complete, adaptive, delivered fast, and coming without lock-in is one of the biggest challenges facing large-scale organisations today.

Best of luck!

 

This is based on work in progress from the second edition of Docker in Practice 

Get 39% off with the code: 39miell2

OpenShift 3.6 DNS In Pictures

Container Networking is Hard Enough…

To those not versed in the dark networking arts, one of the mysteries of OpenShift (RedHat’s wrapper around Kubernetes) is how a pod communicates with the outside world.

This article is more about DNS on clusters, but the point is the same: things can get pretty complicated pretty quickly.

Let’s Add DNSMasq…

Recently I was grappling with this while debugging a Vagrant OpenShift cluster test suite when someone smarter than me took the time to explain what was happening

I wasn’t sure I’d got all the details, so I put together these diagrams to help me follow.

External DNS Lookup

Here’s the ‘simple’ case of a single-container pod pinging google.com:

DNS pod routing - 3.6 external

The steps can be described linearly as:

  • Process starts in container, and needs to know what google.com resolves to.
  • Process looks up /etc/resolv.conf to see where dns queries should be resolved.
  • Process asks the 10.0.2.15 on port 53 (DNS) to get it google.com’s IP
  • DNSMasq determines that this is a query that needs to go to the outside world, so passes it out
    • In this particular setup it passes the lookup to DNSMasq’s configured DNS exit point (which is eth1 in this Vagrant setup)

 

Even here I’m skirting over a lot. The ping process can also refer to /etc/hosts.conf, /etc/nsswitch.conf, /etc/gai.conf, and /etc/hosts, for example. And I use landrush to manage host lookups for my VMs (between the VMs and to/from the host).

In these diagrams, I don’t show the cluster, rather everything happening on the one node.
Also, the IP addresses for the ‘eth0’ are the standard Vagrant-allocated IPs.

resolv.conf

In OpenShift, the resolv.conf file in the container is constructed by taking the resolv.conf from the host operating system. It then places a nameserver above these (which can be set in your node.yaml file).

DNSmasq

By default, this nameserver points to your host IP (10.0.2.15 in this Vagrant setup), which expects a DNS resolver (typically, a dnsmasq server) to be sitting on that IP’s port 53. If no item is provided, it defaults to the kubernetes service IP, by-passing dnsmasq.

DNSMasq uses the servers specified in the files in /etc/dnsmasq.d/*

According to this thread, there is no specific ordering to the asks, it just asks each in turn until it gets an answer.

Local Cluster DNS Lookup

So that’s the ‘simple’ case of an external lookup.

Now we come onto a local dns lookup on the Kubernetes cluster.

DNS pod routing - 3.6 internal

The steps can be described linearly as:

  • Process starts in container, and needs to know what kubernetes.default.svc.cluster.local resolves to.
  • Process looks up /etc/resolv.conf to see where dns queries should be resolved.
  • Process asks the 10.0.2.15 on port 53 (DNS) to get it kubernetes.default.svc.cluster.local’s address
  • DNSMasq determines that this is a query that needs to go to the cluster, so passes it to the OpenShift node process to look up. This process is listening on port 53 of the localhost IP (127.0.0.1).
  • The OpenShift node process either returns the IP address from its cache (which is why bouncing the node process can make some resolution issues go away), or passes the request to the master process DNS.

To help see this setup, you can run this command on the host. In this setup, I have a node and a master OpenShift process running on one Vagrant VM:

[root@master1 ~]# netstat -nltp  | grep 53
tcp  0  0 127.0.0.1:53   0.0.0.0:*   LISTEN      30998/openshift
tcp  0  0 10.0.2.15:53   0.0.0.0:*   LISTEN      31034/dnsmasq
tcp  0  0 0.0.0.0:8053   0.0.0.0:*   LISTEN      29316/openshift

 

This is based on work in progress from the second edition of Docker in Practice 

Get 39% off with the code: 39miell2

 

 

Puppeteer – Headless Chrome in a Container

What is Puppeteer?

Puppeteer is another headless Chrome library, this time maintained by the Chrome DevTools team.

You can play with it online here.

The api is here.

Examples are here.

Docker Image

I’ve created a Docker image of it so you can get playing with it.

The image is available on the Docker Hub:

docker pull dockerinpractice/docker-puppeteer

 

Dockerfile

This is the annotated Dockerfile.

Running a Script

I’m demonstrating using the examples/pdf.js, which creates a pdf of the hackernews front page:

$ docker run -ti dockerinpractice/pupetteer
puser@e4679fb3c9e1:~/node_modules/puppeteer/examples$ node pdf.js
puser@e4679fb3c9e1:~/node_modules/puppeteer/examples$ ls -l hn.pdf 
-rw-r--r-- 1 puser puser 105097 Oct 14 14:18 hn.pdf
puser@e4679fb3c9e1:~/node_modules/puppeteer/examples$ exit
$ docker cp e4679fb3c9e1:/home/puser/node_modules/puppeteer/examples/hn.pdf .
$ open hn.pdf

 

Help Wanted

This implementation is still a little rough – if you can help make all the examples work, and remove the no-sandbox hack then let me know.

 

This is based on work in progress from the second edition of Docker in Practice 

Get 39% off with the code: 39miell2

My 20-Year Experience of Software Development Methodologies

Sapiens and Collective Fictions

Recently I read Sapiens: A Brief History of Humankind by Yuval Harari. The basic thesis of the book is that humans require ‘collective fictions’ so that we can collaborate in larger numbers than the 150 or so our brains are big enough to cope with by default. Collective fictions are things that don’t describe solid objects in the real world we can see and touch. Things like religions, nationalism, liberal democracy, or Popperian falsifiability in science. Things that don’t exist, but when we act like they do, we easily forget that they don’t.

Collective Fictions in IT – Waterfall

This got me thinking about some of the things that bother me today about the world of software engineering. When I started in software 20 years ago, God was waterfall. I joined a consultancy (ca. 400 people) that wrote very long specs which were honed to within an inch of their life, down to the individual Java classes and attributes. These specs were submitted to the customer (God knows what they made of it), who signed it off. This was then built, delivered, and monies were received soon after. Life was simpler then and everyone was happy.

Except there were gaps in the story – customers complained that the spec didn’t match the delivery, and often the product delivered would not match the spec, as ‘things’ changed while the project went on. In other words, the waterfall process was a ‘collective fiction’ that gave us enough stability and coherence to collaborate, get something out of the door, and get paid.

This consultancy went out of business soon after I joined. No conclusions can be drawn from this.

Collective Fictions in IT – Startups ca. 2000

I got a job at another software development company that had a niche with lots of work in the pipe. I was employee #39. There was no waterfall. In fact, there was nothing in the way of methodology I could see at all. Specs were agreed with a phone call. Design, prototype and build were indistinguishable. In fact it felt like total chaos; it was against all of the precepts of my training. There was more work than we could handle, and we got on with it.

The fact was, we were small enough not to need a collective fiction we had to name. Relationships and facts could be kept in our heads, and if you needed help, you literally called out to the room. The tone was like this, basically:

pm

Of course there were collective fictions, we just didn’t name them:

  • We will never have a mission statement
  • We don’t need HR or corporate communications, we have the pub (tough luck if you have a family)
  • We only hire the best

We got slightly bigger, and customers started asking us what our software methodology was. We guessed it wasn’t acceptable to say ‘we just write the code’ (legend had it our C-based application server – still in use and blazingly fast – was written before my time in a fit of pique with a stash of amphetamines over a weekend. It’s still in use.)

Turns out there was this thing called ‘Rapid Application Development’ that emphasized prototyping. We told customers we did RAD, and they seemed happy, as it was A Thing. It sounded to me like ‘hacking’, but to be honest I’m not sure anyone among us really properly understood it or read up on it.

As a collective fiction it worked, because it kept customers off our backs while we wrote the software.

Soon we doubled in size, moved out of our cramped little office into a much bigger one with bigger desks, and multiple floors. You couldn’t shout out your question to the room anymore. Teams got bigger, and these things called ‘project managers’ started appearing everywhere talking about ‘specs’ and ‘requirements gathering’. We tried and failed to rewrite our entire platform from scratch.

Yes, we were back to waterfall again, but this time the working cycles were faster and smaller, and the same problems of changing requirements and disputes with customers as before. So was it waterfall? We didn’t really know.

Collective Fictions in IT – Agile

I started hearing the word ‘Agile’ about 2003. Again, I don’t think I properly read up on it… ever, actually. I got snippets here and there from various websites I visited and occasionally from customers or evangelists that talked about it. When I quizzed people who claimed to know about it their explanations almost invariably lost coherence quickly. The few that really had read up on it seemed incapable of actually dealing with the very real pressures we faced when delivering software to non-sprint-friendly customers, timescales, and blockers. So we carried on delivering software with our specs, and some sprinkling of agile terminology. Meetings were called ‘scrums’ now, but otherwise it felt very similar to what went on before.

As a collective fiction it worked, because it kept customers and project managers off our backs while we wrote the software.

Since then I’ve worked in a company that grew to 700 people, and now work in a corporation of 100K+ employees, but the pattern is essentially the same: which incantation of the liturgy will satisfy this congregation before me?

Don’t You Believe?

I’m not going to beat up on any of these paradigms, because what’s the point? If software methodologies didn’t exist we’d have to invent them, because how else would we work together effectively? You need these fictions in order to function at scale. It’s no coincidence that the Agile paradigm has such a quasi-religious hold over a workforce that is immensely fluid and mobile. (If you want to know what I really think about software development methodologies, read this because it lays it out much better than I ever could.)

One of many interesting arguments in Sapiens is that because these collective fictions can’t adequately explain the world, and often conflict with each other, the interesting parts of a culture are those where these tensions are felt. Often, humour derives from these tensions.

‘The test of a first-rate intelligence is the ability to hold two opposed ideas in mind at the same time and still retain the ability to function.’ F. Scott Fitzgerald

I don’t know about you, but I often feel this tension when discussion of Agile goes beyond a small team. When I’m told in a motivational poster written by someone I’ve never met and who knows nothing about my job that I should ‘obliterate my blockers’, and those blockers are both external and non-negotiable, what else can I do but laugh at it?

How can you be agile when there are blockers outside your control at every turn? Infrastructure, audit, security, financial planning, financial structures all militate against the ability to quickly deliver meaningful iterations of products. And who is the customer here, anyway? We’re talking about the square of despair:

squareofdespair

When I see diagrams like this representing Agile I can only respond with black humour shared with my colleagues, like kids giggling at the back of a church.

AgileMotivationalPoster

When within a smaller and well-functioning functioning team, the totems of Agile often fly out of the window and what you’re left with (when it’s good) is a team that trusts each other, is open about its trials, and has a clear structure (formal or informal) in which agreement and solutions can be found and co-operation is productive. Google recently articulated this (reported briefly here, and more in-depth here).

So Why Not Tell It Like It Is?

You might think the answer is to come up with a new methodology that’s better. It’s not like we haven’t tried:

software-development-methods-explained-with-cars-toggl-infographic-02

 

It’s just not that easy, like the book says:

‘Telling effective stories is not easy. The difficulty lies not in telling the story, but in convincing everyone else to believe it. Much of history revolves around this question: how does one convince millions of people to believe particular stories about gods, or nations, or limited liability companies? Yet when it succeeds, it gives Sapiens immense power, because it enables millions of strangers to cooperate and work towards common goals. Just try to imagine how difficult it would have been to create states, or churches, or legal systems if we could speak only about things that really exist, such as rivers, trees and lions.’

Let’s rephrase that:

‘Coming up with useful software methodologies is not easy. The difficulty lies not in defining them, but in convincing others to follow it. Much of the history of software development revolves around this question: how does one convince engineers to believe particular stories about the effectiveness of requirements gathering, story points, burndown charts or backlog grooming? Yet when adopted, it gives organisations immense power, because it enables distributed teams to cooperate and work towards delivery. Just try to images how difficult it would have been to create Microsoft, Google, or IBM if we could only speak about specific technical challenges.’

Anyway, does the world need more methodologies? It’s not like some very smart people haven’t already thought about this.

Acceptance

So I’m cool with it. Lean, Agile, Waterfall, whatever, the fact is we need some kind of common ideology to co-operate in large numbers. None of them are evil, so it’s not like you’re picking racism over socialism or something. Whichever one you pick is not going to reflect the reality, but if you expect perfection you will be disappointed. And watch yourself for unspoken or unarticulated collective fictions. Your life is full of them. Like that your opinion is important. I can’t resist quoting this passage from Sapiens about our relationship with wheat:

‘The body of Homo sapiens had not evolved for [farming wheat]. It was adapted to climbing apple trees and running after gazelles, not to clearing rocks and carrying water buckets. Human spines, knees, necks and arches paid the price. Studies of ancient skeletons indicate that the transition to agriculture brought about a plethora of ailments, such as slipped discs, arthritis and hernias. Moreover, the new agricultural tasks demanded so much time that people were forced to settle permanently next to their wheat fields. This completely changed their way of life. We did not domesticate wheat. It domesticated us. The word ‘domesticate’ comes from the Latin domus, which means ‘house’. Who’s the one living in a house? Not the wheat. It’s the Sapiens.’

Maybe we’re not here to direct the code, but the code is directing us. Who’s the one compromising reason and logic to grow code? Not the code. It’s the Sapiens.

Currently co-authoring Second Edition of a book on Docker:

Get 39% off with the code 39miell2

dip