The Most Pointless Docker Command Ever

What?

This article will show you how you can undo the things Docker does for you in a Docker command. Clearer now?

OK, Docker relies on Linux namespaces to isolate effectively copy parts of the system so it ends up looking like you are on a separate machine.

For example, when you run a Docker container:

$ docker run -ti busybox ps -a
PID USER COMMAND
 1 root ps -a

it only ‘sees’ its own process IDs. This is because it has its own PID namespace.

Similarly, you have your own network namespace:

$ docker run -ti busybox netstat -a
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags Type State I-Node Path

You also have your own view of inter-process communication and the filesystem.

[adinserter block=”1″]

Go on then

This is possibly the most pointless possible docker command ever run, but here goes:

docker run -ti 
    --privileged 
    --net=host --pid=host --ipc=host 
    --volume /:/host 
    busybox 
    chroot /host

The three ‘=host’ flags bypass the network, pid and ipc namespaces. The volume flag mounts the root filesystem of the host to the ‘/host’ folder in the container (you can’t mount to ‘/’ in the container). The privileged flags gives the user full access to the root user’s capabilities.

All we need is the chroot command, so we use a small image (busybox) to chroot to the filesystem we mounted.

What we end up with is a Docker container that is running as root with full capabilities in the host’s filesystem, will full access to the network, process table and IPC constructs on the host. You can even ‘su’ to other users on the host.

If you can think of a legitimate use for this, please drop me a line!

Why?

Because you can!

Also, it’s quite instructive. And starting from this, you can imagine scenarios where you end up with something quite useful.

Imagine you have an image – called ‘filecheck’ – that runs a check on the filesystem for problematic files. Then you could run a command like this (which won’t work BTW – filecheck does not exist):

docker run --workdir /host -v /:/host:ro filecheck

This modified version of the pointless command dispenses with the chroot in favour of changing the workdir to ‘/host’, and – crucially – the mount now uses the ‘:ro’ suffix to mount the host’s filesystem read-only, preventing the image from doing damage to it.

So you can check your host’s filesystem relatively safely without installing anything.

You can imagine similar network or process checkers running for their namespaces.

Can you think of any other uses for modifications of this pointless command?
This post is based on material from Docker in Practice, available on Manning’s Early Access Program. Get 39% off with the code: 39miell

dip

Advertisements

My Favourite Docker Tip

Currently co-authoring a book on Docker: Get 39% off with the code 39miell

dip

The Problem

To understand the problem we’re going to show you a simple scenario where not having this is just plain annoying.

Imagine you are experimenting in Docker containers, and in the midst of your work you do something interesting and reusable. Here’s it’s going to be a simple echo command, but it could be some long and complex concatenation of programs that result in a useful output.

docker run -ti --rm ubuntu /bin/bash
echo my amazing command
exit

Now you forget about this triumph, and after some time you want to recall the incredible echo command you ran earlier. Unfortunately you can’t recall it and you no longer have the terminal session on your screen to scroll to. Out of habit you try looking through your bash history on the host:

history | grep amazing

…but nothing comes back, as the bash history is kept within the now-removed container and not the host you were returned to.

The Solution – Manual

To share your bash history with the host, you can use a volume mount when running your docker images. Here’s an example:

docker run \
    -e HIST_FILE=/root/.bash_history \
    -v=$HOME/.bash_history:/root/.bash_history \
    -ti \
    ubuntu /bin/bash

The -e argument specifies the history file bash is using on the host.

The -v argument maps the container’s root’s bash history file to the host’s, saving its history to your user’s bash history on the host.

This is quite a handful to type every time, so to make this more user-friendly you can set an alias up by putting the above command as an alias into your ‘~/.bashrc’ file.

alias dockbash='docker run -e HIST_FILE=/root/.bash_history -v=$HOME/.bash_history:/root/.bash_history

Making it Seamless

This is still not seamless as you have to remember to type ‘dockbash’ if you really wanted to perform a ‘docker run’ command. For a more seamless experience you can add this to your ‘~/.bashrc’ file:

function basher() {
    if [[ $1 = 'run' ]]
    then
        shift
        /usr/bin/docker run -e \
            HIST_FILE=/root/.bash_history \
            -v $HOME/.bash_history:/root/.bash_history \
            "$@"
    else
        /usr/bin/docker "$@"
    fi
}
alias docker=basher

It sets up an alias for docker, which by default points to the ‘real’ docker executable in /usr/bin/docker. If the first argument is ‘run’ then it adds the bash magic.

Now when you next open a bash shell and run any ‘docker run’ command, the commands you run within that container will be added to your host’s bash history.

Conclusion

As a heavy Docker user, this change has reduced my frustrations considerably. No longer do I think to myself ‘I’m sure I did something like this a while ago’ without being able to recover my actions.

Convert Any Server to a Docker Container

DEPRECATED PAGE:

UPDATED HERE

 

OLD POST:


 

This post is based on material from Docker in Practice, available on Manning’s Early Access Program. Get 39% off with the code: 39miell

dip

How and Why?

Let’s say you have a server that has been lovingly hand-crafted that you want to containerize.

Figuring out exactly what software is required on there and what config files need adjustment would be quite a task, but fortunately blueprint exists as a solution to that.

What I’ve done here is automate that process down to a few simple steps. Here’s how it works:

Blueprint_Server

You kick off a ShutIt script (as root) that automates the bash interactions required to get a blueprint copy of your server, then this in turn kicks off another ShutIt script which creates a Docker container that provisions the container with the right stuff, then commits it. Got it? Don’t worry, it’s automated and only a few lines of bash.

There are therefore 3 main steps to getting into your container:

– Install ShutIt on the server

– Run the ‘copyserver’ ShutIt script

– Run your copyserver Docker image as a container

Step 1

Install ShutIt as root:

sudo su -
(apt-get update && apt-get install -y python-pip git docker) || (yum update && yum install -y python-pip git docker which)
pip install shutit

The pre-requisites are python-pip, git and docker. The exact names of these in your package manager may vary slightly (eg docker-io or docker.io) depending on your distro.

You may need to make sure the docker server is running too, eg with ‘systemctl start docker’ or ‘service docker start’.

Step 2

Check out the copyserver script:

git clone https://github.com/ianmiell/shutit_copyserver.git

Step 3

Run the copy_server script:

cd shutit_copyserver/bin
./copy_server.sh

There are a couple of prompts – one to correct perms on a config file, and another to ask what docker base image you want to use. Make sure you use one as close to the original server as possible.

Note that this requires a version of docker that has the ‘docker exec’ option.

Step 4

Run the build server:

docker run -ti copyserver /bin/bash

You are now in a practical facsimile of your server within a docker container!

This is not the finished article, so if you need help dockerizing a server, let me know what the problem is, as improvements can still be made.

This post is based on material from Docker in Practice, available on Manning’s Early Access Program. Get 39% off with the code: 39miell

dip

A Field Guide to Docker Security Measures

This post is based on material from Docker in Practice, available on Manning’s Early Access Program. Get 39% off with the code: 39miell

dip

Introduction

If you’re unsure of how to secure Docker for your organisation (given that security wasn’t part of its design), I thought it would be useful to itemise some of the ways in which you can reduce or help manage the risk of running it.

The Two Sides

In this context there are two sides to security from the point of view of a sysadmin, ‘outsider’ and ‘insider’:

  • ‘Outsider’ – preventing an attacker doing damage once they have access to a container
  • ‘Insider’ – preventing a malicious user with access to the docker command from doing damage

‘Outsider’ will be a familiar scenario to anyone who’s thought about security.

‘Insider’ may be a new scenario to some. Since Docker gives you the root user on the host system (albeit within a container), there is the potential to wreak havoc on the host by accident or design. A simple example (don’t run this at home kids – I’ve put a dummy flag in anyway) is:

docker run --dontpastethis --privileged -v /usr:/usr busybox rm -rf /usr

Which will delete your host’s /usr folder. If you want people to be able to run docker, but not with the ability to do this level of damage, there are some steps you can take.

Some measures, naturally, will apply to both. Also some are as much organisational as technical.

Insiders and Outsiders

  • Run Docker Daemon with –selinux

If you run your Docker daemon with the –selinux flag it will do a great deal to prevent those in containers you from doing damage to the host system by creating its own security

This can be set in your docker config file, which usually lives in /etc under /etc/docker or /etc/sysconfig/docker

Defending Against Outsiders

  • Remove capabilities

Capabilities are a division of root into 32 categories. Many of these are disabled by default in Docker (for example, you can’t manipulate iptables rules in a Docker container by default)

To disable all of them you can run:

docker run -ti --cap-drop ALL debian /bin/bash

Or, if you want to be more fine-grained start with nothing, and then re-introduce capabilities as needed:

docker run -ti --cap-drop=CHOWN --cap-drop=DAC_OVERRIDE \
    --cap-drop=FSETID --cap-drop=FOWNER --cap-drop=KILL \
    --cap-drop=MKNOD --cap-drop=NET_RAW --cap-drop=SETGID \
    --cap-drop=SETUID --cap-drop=SETFCAP --cap-drop=SETPCAP \
    --cap-drop=NET_BIND_SERVICE --cap-drop=SYS_CHROOT \
    --cap-drop=AUDIT_WRITE \
    debian /bin/bash

Run ‘man capabilities’ for more information.

Defending Against Insiders

The main problem with giving users access to the docker runtime is that they could run with –privileged and wreak havoc, even if you have selinux enabled.

So if you’re sufficiently paranoid that you want to remove the ability for users to run Docker, some problems arise:

– How to prevent users from effectively running docker with privileges?

– How to allow users to build images?

udocker is a highly experimental and as-yet incomplete program which only allows you to run docker containers as your own (already logged-in) user id.

It’s small enough for security inspection (just a few lines of code: https://github.com/docker-in-practice/udocker/blob/master/udocker.go, forked from https://github.com/ewindisch/udocker) and potentially very useful where you want to lock down what can be run.

To run:

$ git clone https://github.com/docker-in-practice/udocker.git
$ apt-get install golang-go
$ go build
$ id
uid=1001(imiell) gid=1001(imiell) groups=1001(imiell),27(sudo),132(docker)
./udocker fedora:20 whoami
whoami: cannot find name for user ID 1001
$ ./udocker fedora:20 build-locale-archive
permission denied
FATA[0000] Error response from daemon: Cannot start container 6ba3db7094a20c9742a3289401dcf915e03a2906d4e44dbbed42e194de13fd44: [8] System error: permission denied

Compare normal docker:

$ docker run fedora:20 id
uid=0(root) gid=0(root) groups=0(root)

If you then lock down the docker runtime to be executable only by root, you disable much of docker’s attack surface.

  • Docker build on audited server (and private registry)

One solution to allow you to build without access to the docker runtime may be to allow people to submit Dockerfiles via a limited web service which takes care of building the image for you.

It’s relatively easy to knock up a server that takes a Dockerfile as a POST request, builds the image with a web framework such as python-flask, and then deposits the resulting image for post-processing. Or you could even use email as a transport, and email them back a tar file of the checked image build :)

You can also do your static Dockerfile and image checking here before allowing promotion to a privately-run registry. For example you could:

  • Enforce USERs in images

If you have a build server that takes a Dockerfile and produces an image, it becomes relatively easy to do tests.

The first static check I implemented was checking that the image had a valid :

– There is at least one USER line

– The last USER line is not root/uid0

  • Run in a VM

The Google approach. Give each user a locked-down VM on which they can run and do what they like, and define ingress and egress at that level.

This can be a pragmatic approach. Some will object that you lost a lot of the benefits of running Docker at scale, but for many developers running tests or Jenkins servers and slaves this will not matter.

Future Work

  • User namespaces

Support for the mapping of users from host to container is being discussed here:

https://github.com/docker/docker/issues/7906

Further Reading

There’s lots more going on in this space. Here’s some highlight links:

Comprehensive CIS Docker security guide

Docker’s security guide

GDS Docker security guidelines

Dan Walsh (aka Mr SELinux) talk on Docker security

This post is based on material from Docker in Practice, available on Manning’s Early Access Program. Get 39% off with the code: 39miell

dip

Docker SELinux Experimentation with Reduced Pain

This post is based on material from Docker in Practice, available on Manning’s Early Access Program. Get 39% off with the code: 39miell

dip

Introduction

As a Docker enthusiast that works for a corp that cares about security, SELinux is going to be a big deal. While SELinux is in principle simple, in practice it’s difficult to get to grips with. My initial attempts involved reading out of date blogs for tools that were deprecated, and confusing introductions that left me wondering where to go.

Fortunately, I came across this blog, which explained how to implement an SELinux policy for apache in Docker.

I tried to apply this to a Vagrant centos image with Docker on it, but kept getting into a state where something was not working, but I didn’t know what had happened, and then would have to re-provision the box, re-install the software, remember my steps etc etc..

So I wrote a ShutIt script to automate this process, reducing the iteration time to re-provision and re-try changes to this SELinux policy.

See it in action here

Overview

This diagram illustrates the way this script works.

docker-selinux

Once ShutIt is set up, you run it as root with:

# shutit build --delivery bash

The ‘build’ argument tells ShutIt to run the commands in the to the revelant delivery target. By default this is Docker, but here we’re using ShutIt to automate the process of delivery via bash. ssh is also an option.

Running is root is obviously a risk, so be warned if you experiment with the script.

The script is here. It’s essentially a dynamic shell script (readily comprehended in the build method), which can react to different outputs. For example:

# If the Vagrantfile exists, we assume we've already init'd appropriately.
if not shutit.file_exists('Vagrantfile'):
	shutit.send('vagrant init jdiprizio/centos-docker-io')

only calls ‘vagrant init’ if there’s no Vagrant file in the folder. Similarly, these lines:

# Query the status - if it's powered off or not created, bring it up.
if shutit.send_and_match_output('vagrant status',['.*poweroff.*','.*not created.*','.*aborted.*']):
    shutit.send('vagrant up')

send ‘vagrant status’ to the terminal and will call ‘vagrant up’ if the status returns anything that isn’t indicating it’s already up. So the script will only bring up the VM when needed.

And these lines:

vagrant_dir = shutit.cfg[self.module_id]['vagrant_dir']
setenforce  = shutit.cfg[self.module_id]['setenforce'

pick up the config items set in the get_config method, and uses them to determine where to deploy on the host system and whether to fully enforce SELinux on the host.

Crucially, it doesn’t destroy the vagrant environment, so you can re-use the VM with all the software on it pre-installed. It ensures that the environment is cleaned up in such a way that you don’t waste time waiting for a long re-provisioning of the VM.

By setting the vagrant directory (which defaults to /tmp/vagrant_dir, see below) you can wipe it completely with an ‘rm -rf’ if you ever want to be sure you’re starting afresh.

Options

Here’s the invocation with configuration options:

# shutit build -d bash \
    -s io.dockerinpractice.docker_selinux.docker_selinux setenforce no \
    -s io.dockerinpractice.docker_selinux.docker_selinux vagrant_dir /tmp/tmp_vagrant_dir

The -s options define the options available to the docker_selinux module. Here we specify that the VM should have setenforce set to off, and the vagrant directory to use is /tmp/tmp_vagrant_dir.

Setup

Instructions on setup are kept here

#install git
#install python-pip
#install docker
git clone https://github.com/ianmiell/shutit.git
cd shutit
pip install --user -r requirements.txt
echo "export PATH=$(pwd):${PATH}" >> ~/.bashrc
. ~/.bashrc

Then clone the docker-selinux repo and run the script:

git clone https://github.com/ianmiell/docker-selinux.git
cd docker-selinux
sudo su
shutit build --delivery bash

Troubleshooting

Note you may need to alter this line

docker_executable:docker

in the

~/.config/shutit

file to change ‘docker’ to ‘sudo docker’ or however you run docker on your host.

Conclusion

This has considerably sped up my experimentation with SELinux, and I now have a reliable and test-able set of steps to help others (you!) get to grips with SELinux and improve our understanding.

This post is based on material from Docker in Practice, available on Manning’s Early Access Program. Get 39% off with the code: 39miell

dip

Storage Drivers and Docker

This post is based on material from Docker in Practice, available on Manning’s Early Access Program. Get 39% off with the code: 39miell

dip

Storage Drivers?

If you don’t know, Docker has various options for how to store its data. Originally it used AUFS (a layered filesystem), but this was not beloved by all, so as the likes of RedHat got interested and now there are various options, including Devicemapper, VFS and Overlay(FS).

Here’s a deck from a great talk by Jérôme Petazzoni here on the subject.

OK, So What?

Docker is sexy, this is not.

But it’s going to be important to think about this if Docker is to be used in production. The selling point of Docker (and XaaSes in general) is more efficient use of resources. A bad decision on storage drivers (or no decision) could cost you in compute resources, or operational cost.

I’ve put together this high-level, incomplete, and probably wrong view of storage drivers here, as I couldn’t find such a table anywhere else. I’d welcome corrections and improvements, and hope to update as I go.

Driver-Feature
High-density
Big files?
Encryption?
SELinux?
Space limits?
Page cache sharing?
AUFS
Y
N
N(?)
Y
N
Y
DeviceMapper
Y
Y
N(?)
Y
Y
N
BTRFS
Y
Y
N(?)
N
N
N
OverlayFS
Y
N/A(?)
N(?)
N
N(?)
Y
VFS
N
N/A(?)
Y(?)
Y
N
N

Key:
High-density: is it designed to have lots of containers on the same disk (ie copy-on-write)?
Big Files: does it handle big files gracefully (ie block-level rather than file level)?
Encryption: does it support encryption of the files?
SELinux: is there SELinux support?
Space Limits: will the container hit space limits (before standard FS limits are hit)?
Page Cache Share: can the OS share page caches between different containers

Discussion

Page Cache Sharing

As someone that works for a corp with the capacity to run a private Docker environment, the column I find most interesting is the “page cache share” one. If you’re running hundreds of thousands of containers over your estate and you have a limited number of blessed images to work from, then the savings in memory from sharing page caches across containers will be compelling.

Big Files

I’ve experienced first hand the pain of having a system that copies large files on write. If you have a monolithic database running within a container (I’m talking several Gig), then it’s painful to wait for the copy of a single massive data file to update one row while your container is running.

VFS

As VFS is copy-on-copy, VFS may be useful if you are OK taking the filesystem hit when starting up your containers, and don’t care about disk space. In return, you get (presumably) near-native performance. I’ve not used this.
Space Limits
By default, Devicemapper has a 10G limit for containers. It’s surprisingly difficult to resize this out of the box, so can get operationally annoying if you’ve not seen this before

Maturity

The area of storage drivers is still not mature within Docker. While overlay(FS) looks promising (and is reputedly dog-fooded at Docker itself), it may not be the last word, or supported everywhere.

Feedback Wanted

Please send me feedback via twitter (@ianmiell) or if you want to mail me privately go via LinkedIn (Ian Miell)

This post is based on material from Docker in Practice, available on Manning’s Early Access Program. Get 39% off with the code: 39miell

dip

Play With Kubernetes Quickly Using Docker

CODE UPDATE AVAILABLE HERE

 

This post is based on material from Docker in Practice, available on Manning’s Early Access Program. Get 39% off with the code: 39miell

dip

Background

In case you don’t know, Kubernetes is a Google open source project that tackles the problem of how to orchestrate your Docker containers on a data centre.

In a sentence, it allows you to treat groups of Docker containers as single units with their own addressable IP across hosts, and scale them as you wish, allowing you to be declarative about services much in the same way as you can be declarative about configuration with Puppet or Chef, and let Kubernetes take care of the details.

Terminology

Kubernetes has some terminology it’s worth noting here:

  • Pods: groupings of containers
  • Controllers: entities that drive the state of the Kubernetes cluster towards the desired state
  • Service: a set of pods that work together
  • Label: a simple name-value pair
  • Hyperkube: an all-in-one binary that can run a server
  • Kubelet: an agent that runs on nodes and monitors containers, restarting them if necessary

Labels are central point of Kubernetes. By labelling Kubernetes entities, you can take actions across all relevant pods in your data centre. For example, you might want to ensure web server pods run only on specific nodes.

Play

I tried to follow Kubernetes’ Vagrant stand-up, but got frustrated with its slow pace and clunkiness, which I characterized uncharitably as ‘soviet’. Amazingly, a Twitter-whinge about this later and I got a message from Google’s Lead Engineer on Kubernetes saying they were ‘working on it’. Great, but this moved from great to awesome when I was presented with this, a Docker-only way to get Kubernetes running quickly.

NOTE: this code is not presented as stable, so if this walkthrough doesn’t work for you, check the central Kubernetes repo for the latest.

Step One: Start etcd

Kubernetes uses etcd to distribute information across the cluster, so as a core component we start that first:

docker run \
    --net=host \
    -d kubernetes/etcd:2.0.5.1 \
    /usr/local/bin/etcd \
        --addr=$(hostname -i):4001 \
        --bind-addr=0.0.0.0:4001 \
        --data-dir=/var/etcd/data

Step Two: Start the Master

docker run \
    --net=host \
    -d \
    -v /var/run/docker.sock:/var/run/docker.sock\
    gcr.io/google-containers/hyperkube:dev \
f    /hyperkube kubelet \
        --api_servers=http://localhost:8080 \
        --v=2 \
        --address=0.0.0.0 \
        --enable_server \
        --hostname_override=127.0.0.1 \
        --config=/etc/kubernetes/manifests

Kubernetes has a simple Master-Minion architecture (for now – I understand this may be changing). The master handle the APIs for running the pods on the Kubernetes nodes, the scheduler (which determines what should run where based on capacity and constraints), and the replication controller, which ensures the right number of nodes have replicated pods.

If you run it immediately, your docker ps should now look something like this:

imiell@rothko:~$ docker ps
CONTAINER ID IMAGE                              COMMAND              CREATED        STATUS        PORTS NAMES
98b25161f27f gcr.io/google-containers/hyperkube "/hyperkube kubelet  2 seconds ago  Up 1 seconds        drunk_rosalind 
57a0e18fce17 kubernetes/etcd:2.0.5.1            "/usr/local/bin/etcd 31 seconds ago Up 29 seconds       compassionate_sinoussi

One thing to note here is that this master is run from a hyperkube kubelet call, which in turn brings up the master’s containers as a pod. That’s a bit of a mouthful, so let’s break it down.

Hyperkube, as we noted above, is an all-in-one binary for Kubernetes. It will go off and enable the services for the Kubernetes master in a pod. We’ll see what these are below.

Now we have a running Kubernetes cluster, you can manage it from outside using the API by downloading the kubectl binary:

imiell@rothko:~$ wget http://storage.googleapis.com/kubernetes-release/release/v0.14.1/bin/linux/amd64/kubectl
imiell@rothko:~$ chmod +x kubelet
imiell@rothko:~$ ./kubectl version
Client Version: version.Info{Major:"0", Minor:"14", GitVersion:"v0.14.1", GitCommit:"77775a61b8e908acf6a0b08671ec1c53a3bc7fd2", GitTreeState:"clean"}
Server Version: version.Info{Major:"0", Minor:"14+", GitVersion:"v0.14.1-dirty", GitCommit:"77775a61b8e908acf6a0b08671ec1c53a3bc7fd2", GitTreeState:"dirty"}

Let’s see how many minions we’ve got using the get sub-command:

imiell@rothko:~$ ./kubectl get minions
NAME      LABELS STATUS
127.0.0.1  Ready

We have one, running on localhost. Note the LABELS column. Think how we could label this minion: we could mark this minion as “heavy_db_server=true” if it was running on the tin needed to run our db beastie, and direct db server pods there only.

What about these pods then?

imiell@rothko:~$ ./kubectl get pods
POD       IP CONTAINER(S)       IMAGE(S)                                   HOST                LABELS STATUS  CREATED
nginx-127    controller-manager gcr.io/google-containers/hyperkube:v0.14.1 127.0.0.1/127.0.0.1  Running 16 minutes
             apiserver          gcr.io/google-containers/hyperkube:v0.14.1 
             scheduler          gcr.io/google-containers/hyperkube:v0.14.1

This ‘nginx-127’ pod has got three containers from the same Docker image running the master services: the controller-manager, the apiserver, and the scheduler.

Now that we’ve waited a bit, we should be able to see the containers using a normal docker ps:

imiell@rothko:~$ docker ps -a
CONTAINER ID IMAGE                                      COMMAND              CREATED        STATUS        PORTS NAMES
25c781d7bb93 kubernetes/etcd:2.0.5.1                    "/usr/local/bin/etcd 4 minutes ago  Up 4 minutes        suspicious_newton 
8922d0ba9a75 gcr.io/google-containers/hyperkube:v0.14.1 "/hyperkube controll 40 seconds ago Up 39 seconds       k8s_controller-manager.bca40ef7_nginx-127_default_a8ae24cd98c73bd6d873bc54c030606b_c40c7396 
943498867bd6 gcr.io/google-containers/hyperkube:v0.14.1 "/hyperkube schedule 40 seconds ago Up 40 seconds       k8s_scheduler.b41bfb6e_nginx-127_default_a8ae24cd98c73bd6d873bc54c030606b_871c00e2 
354039df992d gcr.io/google-containers/hyperkube:v0.14.1 "/hyperkube apiserve 41 seconds ago Up 40 seconds       k8s_apiserver.c24716ae_nginx-127_default_a8ae24cd98c73bd6d873bc54c030606b_4b062320 
033edd18ff9c kubernetes/pause:latest                    "/pause"             41 seconds ago Up 41 seconds       k8s_POD.7c16d80d_nginx-127_default_a8ae24cd98c73bd6d873bc54c030606b_da72f541 
beddf250f4da gcr.io/google-containers/hyperkube:v0.14.1 "/hyperkube kubelet  43 seconds ago Up 42 seconds       kickass_ardinghelli

Step Three: Run the Service Proxy

The Kubernetes service proxy allows you to expose pods as services from a consistent address. We’ll see this in action later.

docker run \
    -d \
    --net=host \
    --privileged \
    gcr.io/google_containers/hyperkube:v0.14.1 \
    /hyperkube proxy \
        --master=http://127.0.0.1:8080 \
        --v=2

This is run separately as it requires privileged mode to manipulate iptables on your host.

A docker ps will show the proxy as being up:

imiell@rothko:~$ docker ps -a
CONTAINER ID IMAGE                                      COMMAND              CREATED        STATUS        PORTS NAMES
2c8a4efe0e01 gcr.io/google_containers/hyperkube:v0.14.1 "/hyperkube proxy -- 2 seconds ago  Up 1 seconds        loving_lumiere 
8922d0ba9a75 gcr.io/google-containers/hyperkube:v0.14.1 "/hyperkube controll 15 minutes ago Up 15 minutes       k8s_controller-manager.bca40ef7_nginx-127_default_a8ae24cd98c73bd6d873bc54c030606b_c40c7396 
943498867bd6 gcr.io/google-containers/hyperkube:v0.14.1 "/hyperkube schedule 15 minutes ago Up 15 minutes       k8s_scheduler.b41bfb6e_nginx-127_default_a8ae24cd98c73bd6d873bc54c030606b_871c00e2 
354039df992d gcr.io/google-containers/hyperkube:v0.14.1 "/hyperkube apiserve 16 minutes ago Up 15 minutes       k8s_apiserver.c24716ae_nginx-127_default_a8ae24cd98c73bd6d873bc54c030606b_4b062320 
033edd18ff9c kubernetes/pause:latest                    "/pause"             16 minutes ago Up 15 minutes       k8s_POD.7c16d80d_nginx-127_default_a8ae24cd98c73bd6d873bc54c030606b_da72f541 
beddf250f4da gcr.io/google-containers/hyperkube:v0.14.1 "/hyperkube kubelet  16 minutes ago Up 16 minutes       kickass_ardinghelli

Step Four: Run an Application

Now we have our Kubernetes cluster set up locally, let’s run an application with it.

imiell@rothko:~$ ./kubectl -s http://localhost:8080 run-container todopod --image=dockerinpractice/todo --port=8000
CONTROLLER CONTAINER(S) IMAGE(S) SELECTOR REPLICAS
todopod todopod dockerinpractice/todo run-container=todopod 1

This creates a pod from a single image (a simple todo application)

imiell@rothko:~$ kubectl get pods
POD IP        CONTAINER(S)       IMAGE(S)                                   HOST        LABELS                 STATUS  CREATED
nginx-127     controller-manager gcr.io/google-containers/hyperkube:v0.14.1 127.0.0.1/                   Running About a minute
              apiserver          gcr.io/google-containers/hyperkube:v0.14.1 
              scheduler          gcr.io/google-containers/hyperkube:v0.14.1 
todopod-c8n0r todopod            dockerinpractice/todo                       run-container=todopod Pending About a minute

Lots of interesting stuff here – the HOST for our todopod (which has been given a unique name as a suffix) has not been set yet, because the provisioning is still Pending (it’s downloading the image from the Docker Hub).

Eventually you will see it’s running:

imiell@rothko:~$ kubectl get pods
POD           IP          CONTAINER(S)       IMAGE(S)                                   HOST                LABELS                STATUS  CREATED
nginx-127                 controller-manager gcr.io/google-containers/hyperkube:v0.14.1 127.0.0.1/127.0.0.1                 Running About a minute
                          apiserver          gcr.io/google-containers/hyperkube:v0.14.1 
                          scheduler          gcr.io/google-containers/hyperkube:v0.14.1 
todopod-c8n0r 172.17.0.43 todopod            dockerinpractice/todo                      127.0.0.1/127.0.0.1 run-container=todopod Running 5 seconds

and it has an ip address (172.17.0.43). A replication controller is also set up for it, to ensure it gets replicated:

imiell@rothko:~$ ./kubectl get rc
CONTROLLER   CONTAINER(S)   IMAGE(S)                SELECTOR                REPLICAS
todopod      todopod        dockerinpractice/todo   run-container=todopod   1

We can address this service directly using the pod ip:

imiell@rothko:~$ wget -qO- 172.17.0.43:8000 | head -1

Step Six: Set up a Service

But this is not enough – we want to expose these pods as a service to port 80 somewhere:

imiell@rothko:~$ ./kubectl expose rc todopod --target-port=8000 --port=80
NAME      LABELS    SELECTOR                IP          PORT
todopod       run-container=todopod   10.0.0.79   80

So now it’s available on 10.0.0.79:

imiell@rothko:~$ ./kubectl get service
NAME          LABELS                                  SELECTOR              IP        PORT
kubernetes    component=apiserver,provider=kubernetes                 10.0.0.2  443
kubernetes-ro component=apiserver,provider=kubernetes                 10.0.0.1  80
todopod                                         run-container=todopod 10.0.0.79 80

and we’ve successfully mapped port 8000 on the pod to a port 80.

Let’s make things interesting by killing off the todo container:

imiell@rothko:~$ docker ps | grep dockerinpractice/todo
3724233c6637 dockerinpractice/todo:latest "npm start" 13 minutes ago Up 13 minutes k8s_todopod.6d3006f8_todopod-c8n0r_default_439950e4-dc4d-11e4-be97-d850e6c2a11c_da1467a2
imiell@rothko:~$ docker kill 3724233c6637
3724233c6637

and then after a moment (to be sure, wait 20 seconds), call it again:

imiell@rothko:~$ wget -qO- 10.0.0.79 | head -1

The service is still there even though the container isn’t! The replication controller picked up that the container died, and restored service for us:

imiell@rothko:~$ docker ps -a | grep dockerinpractice/todo
b80728e90d3f dockerinpractice/todo:latest "npm start" About a minute ago Up About a minute k8s_todopod.6d3006f8_todopod-c8n0r_default_439950e4-dc4d-11e4-be97-d850e6c2a11c_00316aec 
3724233c6637 dockerinpractice/todo:latest "npm start" 15 minutes ago Exited (137) About a minute ago k8s_todopod.6d3006f8_todopod-c8n0r_default_439950e4-dc4d-11e4-be97-d850e6c2a11c_da1467a2

Step Seven: Make the Service Resilient

Management’s angry that the service was down momentarily. We’ve figured out this is because the container died (and the service was automatically recovered) and want to take steps to prevent a recurrence. So we decide to resize the todopod:

imiell@rothko:~$ ./kubectl resize rc todopod --replicas=2
resized

and there are now two pods running todo containers:

imiell@rothko:~$ kubectl get pods
POD           IP          CONTAINER(S)       IMAGE(S)                                   HOST                LABELS                STATUS  CREATED
nginx-127                 controller-manager gcr.io/google-containers/hyperkube:v0.14.1 127.0.0.1/127.0.0.1                 Running 28 minutes
                          apiserver          gcr.io/google-containers/hyperkube:v0.14.1 
                          scheduler          gcr.io/google-containers/hyperkube:v0.14.1 
todopod-c8n0r 172.17.0.43 todopod            dockerinpractice/todo                      127.0.0.1/127.0.0.1 run-container=todopod Running 27 minutes
todopod-pmpmt 172.17.0.44 todopod dockerinpractice/todo 127.0.0.1/127.0.0.1 run-container=todopod Running 3 minutes

and here’s the two containers:

imiell@rothko:~$ docker ps | grep dockerinpractice/todo
217feb6f25e8 dockerinpractice/todo:latest "npm start" 16 minutes ago Up 16 minutes k8s_todopod.6d3006f8_todopod-pmpmt_default_8e645492-dc50-11e4-be97-d850e6c2a11c_480f79b7 
b80728e90d3f dockerinpractice/todo:latest "npm start" 26 minutes ago Up 26 minutes k8s_todopod.6d3006f8_todopod-c8n0r_default_439950e4-dc4d-11e4-be97-d850e6c2a11c_00316aec

It’s not just the containers that are resilient – try running:

./kubectl delete pod

and see what happens!

It’s Not Magic

Management now thinks that the service is bullet-proof and perfect – but it’s wrong!

The service is still exposed to failure: if the machine that kubernetes is running on dies, the service goes down.

Perhaps more importantly they don’t understand that the todo app is per browser session only, so their todos will not be retained across sessions. Kubernetes does not magically make applications scalable, so some kind of persistent storage and authentication method is required in the application and  to make this work as they want.

Conclusion

This only scratches the surface of Kubernetes’ power. We’ve not looked at multi-container pods and some of the patterns that can be used there, or using labels, for example.

Kubernetes is changing fast, and is being incorporated into other products (such as OpenShift), so it’s worth getting to understand the concepts underlying it. Hyperkube’s a great way to do that fast.

This post is based on material from Docker in Practice, available on Manning’s Early Access Program. Get 39% off with the code: 39miell

dip