Puppeteer – Headless Chrome in a Container

What is Puppeteer?

Puppeteer is another headless Chrome library, this time maintained by the Chrome DevTools team.

You can play with it online here.

The api is here.

Examples are here.

Docker Image

I’ve created a Docker image of it so you can get playing with it.

The image is available on the Docker Hub:

docker pull dockerinpractice/docker-puppeteer



This is the annotated Dockerfile.

Running a Script

I’m demonstrating using the examples/pdf.js, which creates a pdf of the hackernews front page:

$ docker run -ti dockerinpractice/pupetteer
puser@e4679fb3c9e1:~/node_modules/puppeteer/examples$ node pdf.js
puser@e4679fb3c9e1:~/node_modules/puppeteer/examples$ ls -l hn.pdf 
-rw-r--r-- 1 puser puser 105097 Oct 14 14:18 hn.pdf
puser@e4679fb3c9e1:~/node_modules/puppeteer/examples$ exit
$ docker cp e4679fb3c9e1:/home/puser/node_modules/puppeteer/examples/hn.pdf .
$ open hn.pdf


Help Wanted

This implementation is still a little rough – if you can help make all the examples work, and remove the no-sandbox hack then let me know.


This is based on work in progress from the second edition of Docker in Practice 

Get 39% off with the code: 39miell2


My 20-Year Experience of Software Development Methodologies

Sapiens and Collective Fictions

Recently I read Sapiens: A Brief History of Humankind by Yuval Harari. The basic thesis of the book is that humans require ‘collective fictions’ so that we can collaborate in larger numbers than the 150 or so our brains are big enough to cope with by default. Collective fictions are things that don’t describe solid objects in the real world we can see and touch. Things like religions, nationalism, liberal democracy, or Popperian falsifiability in science. Things that don’t exist, but when we act like they do, we easily forget that they don’t.

Collective Fictions in IT – Waterfall

This got me thinking about some of the things that bother me today about the world of software engineering. When I started in software 20 years ago, God was waterfall. I joined a consultancy (ca. 400 people) that wrote very long specs which were honed to within an inch of their life, down to the individual Java classes and attributes. These specs were submitted to the customer (God knows what they made of it), who signed it off. This was then built, delivered, and monies were received soon after. Life was simpler then and everyone was happy.

Except there were gaps in the story – customers complained that the spec didn’t match the delivery, and often the product delivered would not match the spec, as ‘things’ changed while the project went on. In other words, the waterfall process was a ‘collective fiction’ that gave us enough stability and coherence to collaborate, get something out of the door, and get paid.

This consultancy went out of business soon after I joined. No conclusions can be drawn from this.

Collective Fictions in IT – Startups ca. 2000

I got a job at another software development company that had a niche with lots of work in the pipe. I was employee #39. There was no waterfall. In fact, there was nothing in the way of methodology I could see at all. Specs were agreed with a phone call. Design, prototype and build were indistinguishable. In fact it felt like total chaos; it was against all of the precepts of my training. There was more work than we could handle, and we got on with it.

The fact was, we were small enough not to need a collective fiction we had to name. Relationships and facts could be kept in our heads, and if you needed help, you literally called out to the room. The tone was like this, basically:


Of course there were collective fictions, we just didn’t name them:

  • We will never have a mission statement
  • We don’t need HR or corporate communications, we have the pub (tough luck if you have a family)
  • We only hire the best

We got slightly bigger, and customers started asking us what our software methodology was. We guessed it wasn’t acceptable to say ‘we just write the code’ (legend had it our C-based application server – still in use and blazingly fast – was written before my time in a fit of pique with a stash of amphetamines over a weekend. It’s still in use.)

Turns out there was this thing called ‘Rapid Application Development’ that emphasized prototyping. We told customers we did RAD, and they seemed happy, as it was A Thing. It sounded to me like ‘hacking’, but to be honest I’m not sure anyone among us really properly understood it or read up on it.

As a collective fiction it worked, because it kept customers off our backs while we wrote the software.

Soon we doubled in size, moved out of our cramped little office into a much bigger one with bigger desks, and multiple floors. You couldn’t shout out your question to the room anymore. Teams got bigger, and these things called ‘project managers’ started appearing everywhere talking about ‘specs’ and ‘requirements gathering’. We tried and failed to rewrite our entire platform from scratch.

Yes, we were back to waterfall again, but this time the working cycles were faster and smaller, and the same problems of changing requirements and disputes with customers as before. So was it waterfall? We didn’t really know.

Collective Fictions in IT – Agile

I started hearing the word ‘Agile’ about 2003. Again, I don’t think I properly read up on it… ever, actually. I got snippets here and there from various websites I visited and occasionally from customers or evangelists that talked about it. When I quizzed people who claimed to know about it their explanations almost invariably lost coherence quickly. The few that really had read up on it seemed incapable of actually dealing with the very real pressures we faced when delivering software to non-sprint-friendly customers, timescales, and blockers. So we carried on delivering software with our specs, and some sprinkling of agile terminology. Meetings were called ‘scrums’ now, but otherwise it felt very similar to what went on before.

As a collective fiction it worked, because it kept customers and project managers off our backs while we wrote the software.

Since then I’ve worked in a company that grew to 700 people, and now work in a corporation of 100K+ employees, but the pattern is essentially the same: which incantation of the liturgy will satisfy this congregation before me?

Don’t You Believe?

I’m not going to beat up on any of these paradigms, because what’s the point? If software methodologies didn’t exist we’d have to invent them, because how else would we work together effectively? You need these fictions in order to function at scale. It’s no coincidence that the Agile paradigm has such a quasi-religious hold over a workforce that is immensely fluid and mobile. (If you want to know what I really think about software development methodologies, read this because it lays it out much better than I ever could.)

One of many interesting arguments in Sapiens is that because these collective fictions can’t adequately explain the world, and often conflict with each other, the interesting parts of a culture are those where these tensions are felt. Often, humour derives from these tensions.

‘The test of a first-rate intelligence is the ability to hold two opposed ideas in mind at the same time and still retain the ability to function.’ F. Scott Fitzgerald

I don’t know about you, but I often feel this tension when discussion of Agile goes beyond a small team. When I’m told in a motivational poster written by someone I’ve never met and who knows nothing about my job that I should ‘obliterate my blockers’, and those blockers are both external and non-negotiable, what else can I do but laugh at it?

How can you be agile when there are blockers outside your control at every turn? Infrastructure, audit, security, financial planning, financial structures all militate against the ability to quickly deliver meaningful iterations of products. And who is the customer here, anyway? We’re talking about the square of despair:


When I see diagrams like this representing Agile I can only respond with black humour shared with my colleagues, like kids giggling at the back of a church.


When within a smaller and well-functioning functioning team, the totems of Agile often fly out of the window and what you’re left with (when it’s good) is a team that trusts each other, is open about its trials, and has a clear structure (formal or informal) in which agreement and solutions can be found and co-operation is productive. Google recently articulated this (reported briefly here, and more in-depth here).

So Why Not Tell It Like It Is?

You might think the answer is to come up with a new methodology that’s better. It’s not like we haven’t tried:



It’s just not that easy, like the book says:

‘Telling effective stories is not easy. The difficulty lies not in telling the story, but in convincing everyone else to believe it. Much of history revolves around this question: how does one convince millions of people to believe particular stories about gods, or nations, or limited liability companies? Yet when it succeeds, it gives Sapiens immense power, because it enables millions of strangers to cooperate and work towards common goals. Just try to imagine how difficult it would have been to create states, or churches, or legal systems if we could speak only about things that really exist, such as rivers, trees and lions.’

Let’s rephrase that:

‘Coming up with useful software methodologies is not easy. The difficulty lies not in defining them, but in convincing others to follow it. Much of the history of software development revolves around this question: how does one convince engineers to believe particular stories about the effectiveness of requirements gathering, story points, burndown charts or backlog grooming? Yet when adopted, it gives organisations immense power, because it enables distributed teams to cooperate and work towards delivery. Just try to images how difficult it would have been to create Microsoft, Google, or IBM if we could only speak about specific technical challenges.’

Anyway, does the world need more methodologies? It’s not like some very smart people haven’t already thought about this.


So I’m cool with it. Lean, Agile, Waterfall, whatever, the fact is we need some kind of common ideology to co-operate in large numbers. None of them are evil, so it’s not like you’re picking racism over socialism or something. Whichever one you pick is not going to reflect the reality, but if you expect perfection you will be disappointed. And watch yourself for unspoken or unarticulated collective fictions. Your life is full of them. Like that your opinion is important. I can’t resist quoting this passage from Sapiens about our relationship with wheat:

‘The body of Homo sapiens had not evolved for [farming wheat]. It was adapted to climbing apple trees and running after gazelles, not to clearing rocks and carrying water buckets. Human spines, knees, necks and arches paid the price. Studies of ancient skeletons indicate that the transition to agriculture brought about a plethora of ailments, such as slipped discs, arthritis and hernias. Moreover, the new agricultural tasks demanded so much time that people were forced to settle permanently next to their wheat fields. This completely changed their way of life. We did not domesticate wheat. It domesticated us. The word ‘domesticate’ comes from the Latin domus, which means ‘house’. Who’s the one living in a house? Not the wheat. It’s the Sapiens.’

Maybe we’re not here to direct the code, but the code is directing us. Who’s the one compromising reason and logic to grow code? Not the code. It’s the Sapiens.

If you liked this, you may want to look at my book Learn Bash the Hard Way, available at $5:


Also currently co-authoring Second Edition of a book on Docker:

Get 39% off with the code 39miell2


A Non-Cloud Serverless Application Pattern Using Git and Docker


Over time I’ve built up a few different small applications which do simple things like track share prices, or track whether a particular file has changed on GitHub. Little apps that only I use.

While building these I’ve come to use an unorthodox application patterns that allow me to run them ‘serverless’ and without the context of a specific cloud provider.

Since they contain the code and accompanying data, they can be run anywhere that Docker and an internet connection is available.

The key ideas are:

  • use Git to store the entire context of the application (including data)
  • use Git’s distributed nature to remove a central server requirement
  • use Docker to ensure the run context is reproducible

This serverless pattern might be useful as thought-provokers for others who write little one-off apps.


Using Git as a Database

What, essentially, is a database? It is a persistent store of data that you can look up data in.

It doesn’t need to support SQL, be multi-threaded, even have a process running continuously.

For many applications, I use Git as a database. Git has a few useful features that many databases share:

  • a logical log (using ‘git log’)
  • a backup/restore mechanism (git push/pull)
  • ability to do transactions ‘commit’

And a few features that many don’t:

  • ability to restore older versions
  • ability to arbitrate between forks of data


However, git can’t easily be made to do the following things as a database:

  • query data using SQL
  • store huge amounts of data
  • concurrency (without a lot of pain!)
  • performance

For my purposes, this is sufficient for a large proportion of my mini-app use cases.


A trivial example of this is available here, using a trivial app that stores the date last run.

A limited form of querying the history is available using ‘git diff’:

$ git diff 'gitdb@{1 minute ago}'
diff --git a/date.txt b/date.txt
index 65627c8..d02c5d8 100644
--- a/date.txt
+++ b/date.txt
@@ -1 +1 @@
-Sat 5 Aug 2017 10:37:33 CEST
+Sat 5 Aug 2017 10:39:47 CEST

Or you can use ‘git log –patch’ to see the history of changes.

$ git log --patch
commit ab32781c28e02799e2a8130e251ac1c990389b65
Author: Ian Miell <ian.miell@gmail.com>
Date: Sat Aug 5 10:39:47 2017 +0200

Update from app_script.sh

diff --git a/date.txt b/date.txt
index 65627c8..d02c5d8 100644
--- a/date.txt
+++ b/date.txt
@@ -1 +1 @@
-Sat 5 Aug 2017 10:37:33 CEST
+Sat 5 Aug 2017 10:39:47 CEST

commit a491c5ea1c9c1660cc64f88921f43fe0040e3832
Author: Ian Miell <ian.miell@gmail.com>
Date: Sat Aug 5 10:37:33 2017 +0200

Update from app_script.sh

diff --git a/date.txt b/date.txt
index a87844b..65627c8 100644
--- a/date.txt
+++ b/date.txt
@@ -1 +1 @@
-Sat 5 Aug 2017 10:37:23 CEST
+Sat 5 Aug 2017 10:37:33 CEST


Note that the application is stateless – the entire state of the system (including data and code) is stored within the git repo. The application can be stood up anywhere that git and bash are available without ‘interruption’.

This statelessness is a useful property for small applications, and one we will try to maintain as we go on.


When Git Alone Won’t Do

The first limit I hit with this approach is generally the limited querying that I can do on the data that’s stored.

That’s when I need SQL to come to my aid. For this I generally use sqlite, for a few reasons:

  • It’s trivial to set up (database is stored in a single file)
  • It’s trivial to back up/restore to a text-based .sql file
  • It’s a good implementation of SQL for the purposes of basic querying

Again, it has limitations compared to ‘non-lite’ sql databases:

  • Does not scale in size (because the db is a single file)
  • Data types are simple/limited
  • No user manager
  • Limited tuning capability


Using sqlite’s backup and restore functionality along with git means we can retain the stateless nature of the application while getting the benefits of SQL querying to mine the data.

GitAsDb2 (1).jpg


As an example I use this pattern to keep track of my share holdings. I can use this to calculate my position/total profit on a daily basis (including dividends) and look at whether I’m genuinely up or down.

I can query the data using sql and persist these queries to ‘report’ files, which are stored in git also, meaning I can run time-based ‘queries’ like this:

 $ git diff 'master@{1 weeks ago}' reports/profits.txt | grep -A10 Overall.profit
 Overall profit
 current_value profit 
 ------------- ----------

Which tells me how my overall profit looks compared to 1 week ago.

A trivial example of such an app which (continuing from the above example) stores the dates the application was run in a sqlite db, is available here.

Note that we use .gitignore to ignore the actual db file – the sqlite db is a binary object not well stored in git, and the db can be reconstructed from the db_export.sql backup anywhere.


Using Docker for Portability and Statelessness

Sqlite, git, bash… even with our trivial example the dependencies are mounting up.

One way to manage this is to have your application run in a Docker container.

This is where things get a bit more complicated!

The simplified diagram below helps explain what’s going on. The Dockerfile (see below) creates an image which includes the entire git repo as a folder. The application runs within this image as a container, and imports and exports the sqlite database before and after running the application.


An example application continuing the above ‘date’ one is here.

The files it contains are:

  • db_export.sql
  • Dockerfile
  • access_db.sh
  • app_script.sh
  • run.sh


The db_export.sql is the same exported sqlite3 db as in the previous section.


Here is an annotated Dockerfile:

FROM ubuntu:xenial
# Install the needed applications.
RUN apt-get update -y && apt-get install -y sqlite3 git
# Create the folder we will copy our own contents to.
RUN mkdir /gitdb-sqlite-docker
# This folder should be the default working directory, and the
# working directory from here
WORKDIR /gitdb-sqlite-docker
# Add the contents of the git repo to the image.
ADD . /gitdb-sqlite-docker
# Configure git for me.
RUN git config --global user.email "ian.miell@gmail.com"
RUN git config --global user.name "Ian Miell"
# Expect a github password to be passed in 
# Set the origin so that it can pull/push without logging in
RUN git remote set-url origin \
# The image runs the application by default.
CMD /gitdb-sqlite-docker/app_script.sh


This script runs the container. Before doing this, it:

  • Checks we are not in the container
  • Checks the git history is consistent with the remote
  • Gets the github password
  • Rebuilds the image
  • Runs the image
# Exit on error. 
set -e

# Only run outside the container
if [ -e /.dockerenv ]
 echo 'Must be run outside container only'
 exit 1

# Pull to check we are in sync and checked in.
git pull --rebase -s recursive -X ours origin gitdb-sqlite-docker

# Make sure a github password is supplied
if [[ $GITHUB_PASSWORD == '' ]]
 echo 'Input github password: '

# Build the image.
docker build --build-arg GITHUB_PASSWORD=${GITHUB_PASSWORD} -t ${IMAGE_NAME} .
# Run the container.
docker run ${IMAGE_NAME}


This script is run from the running container. It’s the same as the previous example’s

# Exit on error.
set -e

# Only run inside the container
if [ ! -e /.dockerenv ]
 echo 'Must be run in container only'
 exit 1

# Make sure the code is up to date with the origin, preferring any local
# changes we have made. Rebase to preserve a simpler history.
git pull --rebase -s recursive -X ours origin gitdb-sqlite-docker

rm -f ${DBNAME}
# Import db from git
cat ${DBEXPORTFILE} | sqlite3 ${DBNAME}

# The trivial 'application' here simply writes the date to a file.
DATE="$(date '+%Y-%m-%d %H:%M:%S')"
echo $DATE
echo "insert into dates(date) values(\"${DATE}\");" | sqlite3 ${DBNAME}

# Export db from sqlite
echo ".dump" | sqlite3 ${DBNAME} > ${DBEXPORTFILE}
# Commit the change made.
git commit -am 'Update from app_script.sh'
# Push the changes to the origin.
git push -u origin gitdb-sqlite-docker


This script is similar to the others, except that it gives you access to the database should you want to query it directly.


Makefiles for an Application Interface

At this point (especially if your app is getting a little complicated) it can get a little hairy to keep track of all these scripts, especially what should be run in the running container vs the image.

At this point I usually introduce a Makefile, which allows me to consolidate some code and effectively provides me with an application interface.

If I run ‘make’ I get some help by default:

$ make 
make run - run the dates script
make access - access the dates db

Running ‘make run’ will build the docker image and run the container, adding a date to the database. ‘make access’ gives me access to the database directly as before.

Adding this makefile means that I have a standard interface to running the application – if I have an application of this type, just running ‘make’ will tell me what I can and should do.

Here’s what the Makefile looks like:

  @echo 'make run - run the dates script'
  @echo 'make access - access the dates db'

.PHONY: help run access restore check_host check_container check_nodiff

access: check_host check_nodiff restore
  # Access the db.
  sqlite3 dates.db
  # Remove the db.
  rm -f dates.db

run: check_host check_nodiff
  # run the script
  @$(MAKE) -f Makefile check_nodiff

restore: check_nodiff
  rm -f dates.db
  cat db_export.sql | sqlite3 dates.db

  # only run in a host
  if [ -e /.dockerenv ]; then exit 1; fi

  # only run in a container
  if [ ! -e /.dockerenv ]; then exit 1; fi

  # Pull to check we do not have local changes
  git pull --rebase -s recursive -X ours origin gitdb-sqlite-docker-makefile

and the rest of the code is available here.


As with everything in IT, there is nothing new under the sun.

You could replace git and Docker with a tar file and an extra shell script to manage it all, and you have something that might look similar.

However, I find this a very useful pattern for quickly throwing up a data-based application that I can run and maintain myself without too much hassle or state management. It uses standard tools, has a clean interface and is genuinely portable across providers.


This is based on work in progress from the second edition of Docker in Practice 

Get 39% off with the code: 39miell2


Run Your Own AWS APIs on OpenShift



This article shows you how you can use OpenShift to set up and test against AWS APIs using localstack.

Example code to run through this using ShutIt is available here.

Here’s an asciicast of the process:


In this walkthrough you’re going to set up an OpenShift system using minishift, and then run localstack in a pod on it.

OpenShift is a RedHat-sponsored wrapper around Kubernetes that provides extra functionality more suited to enterprise production deployments of Kubernetes. Many features from OpenShift have swum upstream to be integrated into Kubernetes (eg role-based access control).

The open source version of OpenShift is called Origin.


Localstack is a project that aims to give you as complete as possible a set of AWS APIs to develop against without incurring any cost. This is great for testing or trying code out before running it ‘for real’ against AWS and potentially wasting time and money.

Localstack spins up the following core Cloud APIs on your local machine:

At present it supports running in a Docker container, or natively on a machine.

It is built on moto, which is a mocking framework in turn built on boto, which is a python AWS SDK.

Running within an OpenShift cluster gives you the capability to run very many of these AWS API environments. You can then create distinct endpoints for each set of services, and isolate them from one another. Also, you can worry less about resource usage as the cluster scheduler will take care of that.

However, it doesn’t run out of the box, so this will guide you through what needs to be done to get it to work.

Started Minishift?

If you don’t have an OpenShift cluster to hand, then you can run up minishift, which gives you a standalone VM with a working OpenShift on it.

Installing minishift is documented here. You’ll need to install it first and run ‘minishift start’ successfully.

Once you have started minishift, you will need to set up your shell so that you are able to communicate with the OpenShift server.

$ eval $(minishift oc-env)

Change the default security context constraints

Security Context Constraints (scc) are an OpenShift concept that allows more granular control over Docker containers’ powers.

They control seLinux contexts, can drop capabilities from the running containers, can determine which user the pod can run as, and so on.

To get this running you’re going to change the default ‘restricted’ scc, but you could create a separate scc and apply that to a particular project. To change the ‘restricted’ scc you will need to become a cluster administrator:

$ oc login -u system:admin

Then you need to edit the restricted scc with:

$ oc edit scc restricted

You will see the definition of the restricted

At this point you’re going to have to do two things:

  • Allow containers to run as any user (in this case ‘root’)
  • Prevent the scc from restricting your capabilities to setuid and setgid

1) Allow RunAsAny

The localstack container runs as root by default.

For security reasons, OpenShift does not allow containers to run as root by default. Instead it picks a random UID within a very high range, and runs as that.

To simplify matters, and allow the localstack container to run as root, change the lines:

 type: MustRunAsRange

to read:

 type: RunAsAny

this allows containers to run as any user.

2) Allow SETUID and SETGID Capabilities

When localstack starts up it needs to become another user to start up elasticache. The elasticache service does not start up as the root user.

To get round this, localstack su’s the startup command to the localstack user in the container.

Because the ‘restricted’ scc explicitly disallows actions that change your user or group id, you need to remove these restrictions. Do this by deleting the lines:


Once you have done these two steps, save the file.

Make a note of the host

If you run:

$ minishift console --machine-readable | grep HOST | sed 's/^HOST=\(.*\)/\1/'

you will get the host that the minishift instance is accessible as from your machine. Make a note of this, as you’ll need to substitute it in later.

Deploy the pod

Deploying the localstack is as easy as running:

$ oc new-app localstack/localstack --name="localstack"

This takes the localstack/localstack image and creates an OpenShift application around it for you, setting up internal services (based on the exposed ports in the Dockerfile), running the container in a pod, and various other management tasks.

Create the routes

If you want to access the services from outside, you need to create OpenShift routes, which create an external address to access services within the OpenShift network.

For example, to create a route for the sqs service, create a file like this:

apiVersion: v1 
- apiVersion: v1 
  kind: Route 
      openshift.io/host.generated: "true" 
    name: sqs 
    selfLink: /oapi/v1/namespaces/test/routes/sqs 
    host: sqs-test.HOST.nip.io 
      targetPort: 4576-tcp 
      kind: Service 
      name: localstack 
      weight: 100 
    wildcardPolicy: None 
    - conditions: 
      - lastTransitionTime: 2017-07-28T17:49:18Z 
        status: "True" 
        type: Admitted 
      host: sqs-test.HOST.nip.io 
      routerName: router 
      wildcardPolicy: None 
kind: List 
metadata: {} 
resourceVersion: "" 
selfLink: ""

then create the route with:

$ oc create -f 

See above for the list of services and their ports.

If you have multiple localstacks running on your OpenShift cluster, you might want to prepend the host name with a unique name for the instance, eg

host: localstackenv1-sqs-test.HOST.nip.io

.Look upon your work

Run an ‘oc get all’ to see what you have created within your OpenShift project:

$ oc get all
is/localstack latest 15 hours ago

dc/localstack 1 1 1 config,image(localstack:latest)

rc/localstack-1 1 1 1 15h

routes/apigateway apigateway-test. localstack 4567-tcp None
routes/cloudformation cloudformation-test. localstack 4581-tcp None
routes/cloudwatch cloudwatch-test. localstack 4582-tcp None
routes/dynamodb dynamodb-test. localstack 4569-tcp None
routes/dynamodbstreams dynamodbstreams-test. localstack 4570-tcp None
routes/es es-test. localstack 4578-tcp None
routes/firehose firehose-test. localstack 4573-tcp None
routes/kinesis kinesis-test. localstack 4568-tcp None
routes/lambda lambda-test. localstack 4574-tcp None
routes/redshift redshift-test. localstack 4577-tcp None
routes/route53 route53-test. localstack 4580-tcp None
routes/s3 s3-test. localstack 4572-tcp None
routes/ses ses-test. localstack 4579-tcp None
routes/sns sns-test. localstack 4575-tcp None
routes/sqs sqs-test. localstack 4576-tcp None
routes/web web-test. localstack 8080-tcp None

svc/localstack  4567/TCP,4568/TCP,4569/TCP,4570/TCP,4571/TCP,4572/TCP,4573/TCP,4574/TCP,4575/TCP,4576/TCP,4577/TCP,4578/TCP,4579/TCP,4580/TCP,4581/TCP,4582/TCP,8080/TCP 15h

po/localstack-1-hnvpw 1/1 Running 0 15h

Each route created is now accessible as an AWS service ready to test your code.

Access the services

Can now hit the services from your host, like this:

$ aws --endpoint-url=http://kinesis-test. kinesis list-streams
 "StreamNames": []

For example, to create a kinesis stream:

$ aws --endpoint-url=http://kinesis-test. kinesis create-stream --stream-name teststream --shard-count 2
$ aws --endpoint-url=http://kinesis-test. kinesis list-streams
 "StreamNames": [

This is an extract from my book

This is a work in progress from the second edition of Docker in Practice 

Get 39% off with the code: 39miell2

Dockerized Headless Chrome Example

For those of us obsessed with automation, the PhantomJS library was manna from heaven, allowing you to programmatically automate web interactions against a ‘real’ web browser without need a screen to interact with.

Earlier this year, the principal maintainer announced that he was stepping down from the project in favour of ‘Headless Chrome’.

Headless Chrome is still new, and there isn’t much material to chew on yet, but I came across this blog post, which shows how to set up an Ubuntu Trusty server with a simple screen grabber script.

I thought I’d transfer this to a Docker container, as lightweight spinning up of these processes will be a boon for testing.

Here’s a video of the script getting screenshots of two randomly-chosen websites:


The repository is here.

My book on Docker:

Get 39% off with the code: 39miell2

Convert a Server to a Docker Container (Update II)

How and Why?

Let’s say you have a server that has been lovingly hand-crafted that you want to containerize.

Figuring out exactly what software is required on there and what config files need adjustment would be quite a task, but fortunately blueprint exists as a solution to that.

What I’ve done here is automate that process down to a few simple steps. Here’s how it works:


You kick off a ShutIt script (as root) that automates the bash interactions required to get a blueprint copy of your server, then this in turn kicks off another ShutIt script which creates a Docker container that provisions the container with the right stuff, then commits it. Got it? Don’t worry, it’s automated and only a few lines of bash.

There are therefore 3 main steps to getting into your container:

– Install ShutIt on the server

– Run the ‘copyserver’ ShutIt script

– Run your copyserver Docker image as a container

Step 1

Install ShutIt as root:

sudo su -
pip install shutit

The pre-requisites are python-pip, git and docker. The exact names of these in your package manager may vary slightly (eg docker-io or docker.io) depending on your distro.

You may need to make sure the docker server is running too, eg with ‘systemctl start docker’ or ‘service docker start’.

Step 2

Check out the copyserver script:

git clone https://github.com/ianmiell/shutit_copyserver.git

Step 3

Run the copy_server script:

cd shutit_copyserver/bin

There is a prompt to ask what docker base image you want to use.

Make sure you use one as close to the original server as possible, eg ubuntu/trusty or ubuntu:14.04 rather than just ‘ubuntu’.

Step 4

Run the built image:

docker run -ti copyserver /bin/bash

You are now in a practical facsimile of your server within a docker container!

Gotchas Checklist

If it doesn’t work, here’s a checklist of things that might have gone wrong:

  • Python 2.7+ is required
  • Using the wrong image – make sure it’s as close to the original as possible
  • Not having Docker installed on the host
  • The server is not apt or yum based
  • The server may run out of memory (at least 1G recommended)

If none of the above work, send the output of:

./copy_server.sh -l debug --echo
cd /tmp/shutit_copyserver && shutit build --echo -d docker -s repository tag yes -s repository_name copyserver -l DEBUG

to an issue on github.


My book on Docker:

Get 39% off with the code: 39miell2

Automating Dockerized Jenkins Upgrades


If you’ve used Jenkins for a while in production, then you will be aware that Jenkins frequently publishes updates to its server for security and functionality changes.

On a dedicated, non-dockerized host, this is generally managed for you through package management. With Docker it can get slightly more complicated to reason about upgrades, as you’ve likely separated out the context of the server from its data.


You want to reliably upgrade your Jenkins server.


This technique is delivered as a Docker image composed of a number of parts. First we will outline the Dockerfile that builds the image. This Dockerfile draws from the library docker image (which contains a docker client) and adds a script that manages the upgrade.

The image is run in a docker command that mounts the docker items on the host, giving it the ability to manage any required Jenkins upgrade.


We start with the Dockerfile:

FROM docker                                                    <1>
ADD jenkins_updater.sh /jenkins_updater.sh                     <2>
RUN chmod +x /jenkins_updater.sh                               <3>
ENTRYPOINT /jenkins_updater.sh                                 <4>

<1> – Use the ‘docker’ standard library image

<2> – Add in the ‘jenkins_updater.sh’ script (see below)

<3> – Ensure that the ‘jenkins_updater.sh’ script is runnable

<4> – Set the default entrypoint for the image to be the ‘jenkins_updater.sh’ script

The above Dockerfile encapsulates the requirements to back up Jenkins in a runnable Docker image. It uses the ‘docker’ standard library image. We use this to get a Docker client to run within a container. This container will run the script in the next listing to manage any required upgrade of Jenkins on the host.

NOTE: If your docker daemon version differs from the version in the ‘docker’ Docker image, then you may run into problems. Try to use the same version.


This is the shell script that manages the upgrade within the container:

#!/bin/sh                                                        <1>
set -e                                                           <2>
set -x                                                           <3>
if ! docker pull jenkins | grep up.to.date                       <4>
 docker stop jenkins                                             <5>
 docker rename jenkins jenkins.bak.$(date +%Y%m%d%H%M)           <6>
 cp -r /var/docker/mounts/jenkins_home \                         <7>
       /var/docker/mounts/jenkins_home.bak.$(date +%Y%m%d%H%M)   <7>
 docker run -d \                                                 <8>
     --restart always \                                          <9>
     -v /var/docker/mounts/jenkins_home:/var/jenkins_home \      <10>
     --name jenkins \                                            <11>
        -p 8080:8080 \                                           <12>
     jenkins                                                     <13>

<1> – This script uses the ‘sh’ shell (not the ‘/bin/bash’ shell) because only ‘sh’ is available on the ‘docker’ Docker image

<2> – This ‘set’ command ensures the script will fail if any of the commands within it fail

<3> – This ‘set’ command logs all the commands run in the script to standard output

<4> – The ‘if’ block only fires if ‘docker pull jenkins’ does not output ‘up to date’

<5> – When upgrading, begin by stopping the jenkins container

<6> – Once stopped, rename the jenkins container to ‘jenkins.bak.’ followed by the time to the minute

<7> – Copy the Jenkins container image state folder to a backup

<8> – Run the docker command to start up Jenkins, and run it as a daemon

<9> – Set the jenkins container to always restart

<10> – Mount the jenkins state volume to a host folder

<11> – Give the container the name ‘jenkins’ to prevent multiple of these containers running simultaneously by accident

<12> – Publish the 8080 port in the container to the 8080 port on the host

<13> – Finally, the jenkins image name to run is given to the docker command

The above script tries to pull jenkins from the docker hub with the ‘docker pull’ command. If the output contains the phrase ‘up to date’, then the ‘docker pull | grep …’ command returns true. However, we only want to upgrade when we did _not_ see ‘up to date’ in the output. This is why the ‘if’ statement is negated with a ‘!’ sign after the ‘if’.

The result is that the code in the ‘if’ block is only fired if we downloaded a new version of the ‘latest’ Jenkins image. Within this block, the running Jenkins container is stopped and renamed. We rename it rather than delete it in case the upgrade did not work and we need to reinstate the previous version.

Further to this rollback strategy, the mount folder on the host containing Jenkins’ state is backed up also.

Finally, the latest-downloaded Jenkins image is started up using the docker run command.

NOTE: You may want to change the host mount folder and/or the name of the running Jenkins container based on personal preference.

The attentive reader might be wondering how this Jenkins image is connected to the host’s Docker daemon. To achieve this, the image is run using a commonly-used method in the book:

The jenkins-updater image invocation

docker run                                                 <1>
    --rm \                                                 <2>
    -d \                                                   <3>
 -v /var/lib/docker:/var/lib/docker \                      <4>
 -v /var/run/docker.sock:/var/run/docker.sock \            <5>
 -v /var/docker/mounts:/var/docker/mounts                  <6>
 dockerinpractice/jenkins-updater                          <7>

<1> – The docker run command

<2> – You want the container to be removed when it has completed its job

<3> – Run the container in the background

<4> – Mount the host’s docker daemon folder to the container

<5> – Mount the host’s docker socket to the container so the docker command will work within the container

<6> – Mount the host’s docker mount folder where the Jenkins data is stored, so that the jenkins_updater.sh script can copy the files

<7> – The dockerinpractice/jenkins-updater image is the image to be run

Automating the upgrade

This one-liner makes it easy to run within a crontab. We run this on our home

servers. The crontab line looks like this:

0 * * * * docker run --rm -d -v /var/lib/docker:/var/lib/docker -v /var/run/docker.sock:/var/run/docker.sock -v /var/docker/mounts:/var/docker/mounts dockerinpractice/jenkins-updater 

NOTE: The above is all on one line because crontab does not ignore newlines if there is a backslash in front in the way that shellscripts do.

The end result is that a single crontab entry can safely manage the upgrade of your Jenkins instance without you having to worry about it. The task of automating the cleanup of old backed up containers and volume mounts is left as an exercise for the reader.


This technique exemplifies a few things which we come across throughout the book which can be applied in similar contexts to situations other than Jenkins.

First, it uses the core docker image to communicate with the Docker daemon on the host. Other portable scripts might be written to manage Docker daemons in other ways. For example, you might want to write scripts to remove old volumes, or report on the activity on your daemon.

More specifically, the ‘if’ block pattern could be used to update and restart other images when a new one is available. It is not uncommon for images to be updated for security reasons, or to make minor upgrades.

If you are concerned with difficulties in upgrading versions, it’s also worth pointing out that you need not take the ‘latest’ image tag (which this technique does). Many images have different tags that track different version numbers.

For example, your image ‘exampleimage’ might have a exampleimage:latest tag, as well as an exampleimage:v1.1 tag, and a exampleimage:v1. Any of these might be updated at any time, but the :v1.1 tag is less likely to move to a new version than the :latest one. The :latest one could move to the same version as a new  :v1.2 one (which might require steps to upgrade) or even a :v2.1 one, where the new major version ‘2’ indicates a change more likely to be disruptive to any upgrade process.

This technique also outlines a rollback strategy for docker upgrades. The separation of container and data (using volume mounts) can create tension about the stability of any upgrade. By retaining the old container and a copy of the old data at the point where the service was working, it is easier to recover  from failure.

Database Upgrades and Docker

Database upgrades are a particular context in which these stability concerns are germane.

If you want to upgrade your database to a new version, you have to consider whether the upgrade requires a change to the data structures and storage of the database’s data. It’s not enough simply to run the new version’s image as a container and expect it to work.

It gets a bit more complicated if the database is ‘smart’ enough to know which version of the data it is ‘seeing’, and can perform the upgrade itself accordingly. In these cases, you might be more comfortable upgrading.

Many factors feed into your upgrade strategy. Your app might tolerate an ‘optimistic’ approach (as we see here in the Jenkins example) which assumes everything will be OK, and prepares for failure when (not if) it occurs. On the other hand, you might demand 100% uptime, and not tolerate failure of any kind at all. In such cases, a fully-tested upgrade plan and a deeper knowledge of the platform than running ‘docker pull’ is generally desired (with or without the involvement of Docker).

Although Docker does not remove the upgrade problem, the immutability of the versioned images can make it simpler to reason about them. Docker can also help you prepare for failure in two ways: backing up state in host volumes, and making testing predictable state more easy. The hit you take in managing and  understanding what Docker is doing can give you more control and certainty  about the upgrade process.



This technique is taken from the upcoming second edition of my book Docker in Practice:

Get 39% off with the code: 39miell2