Anatomy of a Linux DNS Lookup – Part II


In Anatomy of a Linux DNS Lookup – Part I I covered:

  • nsswitch
  • /etc/hosts
  • /etc/resolv.conf
  • ping vs host style lookups

and determined that most programs reference /etc/resolv.conf along the way to figuring out which DNS server to look up.

That stuff was more general linux behaviour (*) but here we move firmly into distribution-specific territory. I use ubuntu, but a lot of this will overlap with Debian and even CentOS-based distributions, and also differ from earlier or later Ubuntu versions.

(*) in fact, it’s subject to a POSIX standard, so
is not limited to Linux (I learned this from
a fantastic comment on the previous post)

In other words: your host is more likely to differ in its behaviour in specifics from here.

In Part II I’ll cover how resolv.conf can get updated, what happens when systemctl restart networking is run, and how dhclient gets involved.

1) Updating /etc/resolv.conf by hand

We know that /etc/resolv.conf is (highly likely to be) referenced, so surely you can just add a nameserver to that file, and then your host will use that nameserver in addition to the others, right?

If you try that:

$ echo nameserver >> /etc/resolv.conf

it all looks good:

# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
search home

until the network is restarted:

$ systemctl restart networking
$ cat /etc/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
search home

our nameserver has gone!

This is where those comments we ignored in Part I come in…

2) resolvconf

You see the phrase generated by resolvconf in the /etc/resolv.conf file above? This is our clue.

If you dig into what systemctl restart networking does, among many other things, it ends up calling a script: /etc/network/if-up.d/000resolvconf. Within this script is a call to resolvconf:

/sbin/resolvconf -a "${IFACE}.${ADDRFAM}"

A little digging through the man pages reveals that the -a flag allows us to:

Add or overwrite the record IFACE.PROG then run the update scripts
if updating is enabled.

So maybe we can call this directly to add a nameserver:

echo 'nameserver' | /sbin/resolvconf -a enp0s8.inet

Turns out we can!

$ cat /etc/resolv.conf  | grep nameserver

So we’re done now, right? This is how /etc/resolv.conf gets updated? Calling resolvconf adds it to a database somewhere, and then updates (if configured, whatever that means) the resolv.conf file


systemctl restart networking
root@linuxdns1:/etc# cat /etc/resolv.conf 
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
search home

Argh! It’s gone again.

So systemctl restart networking does more than just run resolvconf. It must be getting the nameserver information from somewhere else. Where?

3) ifup/ifdown

Digging further into what systemctl restart networking does tells us a couple of things:

cat /lib/systemd/system/networking.service
ExecStartPre=-/bin/sh -c '[ "$CONFIGURE_INTERFACES" != "no" ] && [ -n "$(ifquery --read-environment --list --exclude=lo)" ] && udevadm settle'
ExecStart=/sbin/ifup -a --read-environment
ExecStop=/sbin/ifdown -a --read-environment --exclude=lo

First, the networking ‘service’ restart is actually a ‘oneshot’ script that runs these commands:

/sbin/ifdown -a --read-environment --exclude=lo
/bin/sh -c '[ "$CONFIGURE_INTERFACES" != "no" ] && [ -n "$(ifquery --read-environment --list --exclude=lo)" ] && udevadm settle'
/sbin/ifup -a --read-environment

The first line with ifdown brings down all the network interfaces (but excludes the local interface). (*)

(*) I’m unclear why this doesn’t boot me out of my
vagrant session in my example code (anyone know?).

The second line makes sure the system has done all it needs to do regarding the bringing of network interfaces down before going ahead and bringing them all back up with ifup in the third line. So the second thing we learn is that ifup and ifdown are what the networking service ‘actually’ runs.

The --read-environment flag is undocumented, and is there so that systemctl can play nice with it. A lot of people hate systemctl for this kind of thing.

Great. So what does ifup (and its twin, ifdown) do? To cut another long story short, it runs all the scripts in etc/network/if-pre-up.d/ and /etc/network/if-up.d/. These in turn might run other scripts, and so on.

One of the things it does (and I’m still not quite sure how – maybe udev is involved?) dhclient gets run.

4) dhclient

dhclient is a program that interacts with DHCP servers to negotiate the details of what IP address the network interface specified should use. It also can receive a DNS nameserver to use, which then gets placed in the /etc/resolv.conf.

Let’s cut to the chase and simulate what it does, but just on the enp0s3 interface on my example VM, having first removed the nameserver from the /etc/resolv.conf file:

$ sed -i '/nameserver.*/d' /run/resolvconf/resolv.conf
$ cat /etc/resolv.conf | grep nameserver
$ dhclient -r enp0s3 && dhclient -v enp0s3
Killed old client process
Internet Systems Consortium DHCP Client 4.3.3
Copyright 2004-2015 Internet Systems Consortium.
All rights reserved.
For info, please visit
Listening on LPF/enp0s8/08:00:27:1c:85:19
Sending on   LPF/enp0s8/08:00:27:1c:85:19
Sending on   Socket/fallback
DHCPDISCOVER on enp0s8 to port 67 interval 3 (xid=0xf2f2513e)
DHCPREQUEST of on enp0s8 to port 67 (xid=0x3e51f2f2)
DHCPACK of from
bound to -- renewal in 519 seconds.

$ cat /etc/resolv.conf | grep nameserver

So that’s where the nameserver comes from…

But hang on a sec – what’s that /run/resolvconf/resolv.conf doing there, when it should be /etc/resolv.conf?

Well, it turns out that /etc/resolv.conf isn’t always ‘just’ a file.

On my VM, it’s a symlink to the ‘real’ file stored in /run/resolvconf. This is a clue that the file is constructed at run time, and one of the reasons we’re told not to edit the file directly.

If the sed command above were to be run on the /etc/resolv.conf file directly then the behaviour above would be different and a warning thrown about /etc/resolv.conf not being a symlink (sed -i doesn’t handle symlinks cleverly – it just creates a fresh file).

dhclient offers the capability to override the DNS server given to you by DHCP if you dig a bit deeper into the supersede setting in /etc/dhcp/dhclient.conf

linux-dns-2 (2)

A (roughly) accurate map of what’s going on


End of Part II

That’s the end of Part II. Believe it or not that was a somewhat simplified version of what goes on, but I tried to keep it to the important and ‘useful to know’ stuff so you wouldn’t fall asleep. Most of that detail is around the twists and turns of the scripts that actually get run.

And we’re still not done yet. Part III will look at even more layers on top of these.

Let’s briefly list some of the things we’ve come across so far:

  • nsswitch
  • /etc/hosts
  • /etc/resolv.conf
  • /run/resolvconf/resolv.conf
  • systemd and its networking service
  • ifup and ifdown
  • dhclient
  • resolvconf

















Or you might like Docker in Practice


Anatomy of a Linux DNS Lookup – Part I

Since I work a lot with clustered VMs, I’ve ended up spending a lot of time trying to figure out how DNS lookups work. I applied ‘fixes’ to my problems from StackOverflow without really understanding why they work (or don’t work) for some time.

Eventually I got fed up with this and decided to figure out how it all hangs together. I couldn’t find a complete guide for this anywhere online, and talking to colleagues they didn’t know of any (or really what happens in detail)

So I’m writing the guide myself.

Turns out there’s quite a bit in the phrase ‘Linux does a DNS lookup’…


“How hard can it be?”

These posts are intended to break down how a program decides how it gets an IP address on a Linux host, and the components that can get involved. Without understanding how these pieces fit together, debugging and fixing problems with (for example) dnsmasq, vagrant landrush, or resolvconf can be utterly bewildering.

It’s also a valuable illustration of how something so simple can get so very complex over time. I’ve looked at over a dozen different technologies and their archaeologies so far while trying to grok what’s going on.

I even wrote some automation code to allow me to experiment in a VM. Contributions/corrections are welcome.

Note that this is not a post on ‘how DNS works’. This is about everything up to the call to the actual DNS server that’s configured on a linux host (assuming it even calls a DNS server – as you’ll see, it need not), and how it might find out which one to go to, or how it gets the IP some other way.

1) There is no such thing as a ‘DNS Lookup’ call


This is NOT how it works

The first thing to grasp is that there is no single method of getting a DNS lookup done on Linux. It’s not a core system call with a clean interface.

There is, however, a standard C library call called which many programs use: getaddrinfo. But not all applications use this!

Let’s just take two simple standard programs: ping and host:

root@linuxdns1:~# ping -c1 | head -1
PING ( 56(84) bytes of data.
root@linuxdns1:~# host | head -1 has address

They both get the same result, so they must be doing the same thing, right?


Here’s the files that ping looks at on my host that are relevant to DNS:

root@linuxdns1:~# strace -e trace=open -f ping -c1
open("/etc/", O_RDONLY|O_CLOEXEC) = 3
open("/lib/x86_64-linux-gnu/", O_RDONLY|O_CLOEXEC) = 3
open("/lib/x86_64-linux-gnu/", O_RDONLY|O_CLOEXEC) = 3
open("/etc/resolv.conf", O_RDONLY|O_CLOEXEC) = 4
open("/etc/resolv.conf", O_RDONLY|O_CLOEXEC) = 4
open("/etc/nsswitch.conf", O_RDONLY|O_CLOEXEC) = 4
open("/etc/", O_RDONLY|O_CLOEXEC) = 4
open("/lib/x86_64-linux-gnu/", O_RDONLY|O_CLOEXEC) = 4
open("/etc/host.conf", O_RDONLY|O_CLOEXEC) = 4
open("/etc/hosts", O_RDONLY|O_CLOEXEC)  = 4
open("/etc/", O_RDONLY|O_CLOEXEC) = 4
open("/lib/x86_64-linux-gnu/", O_RDONLY|O_CLOEXEC) = 4
open("/lib/x86_64-linux-gnu/", O_RDONLY|O_CLOEXEC) = 4
PING ( 56(84) bytes of data.
open("/etc/hosts", O_RDONLY|O_CLOEXEC)  = 4
64 bytes from ( icmp_seq=1 ttl=63 time=13.0 ms

and the same for host:

$ strace -e trace=open -f host
[pid  9869] open("/usr/share/locale/en_US.UTF-8/LC_MESSAGES/", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid  9869] open("/usr/share/locale/en/", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid  9869] open("/usr/share/locale/en/LC_MESSAGES/", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid  9869] open("/usr/lib/ssl/openssl.cnf", O_RDONLY) = 6
[pid  9869] open("/usr/lib/x86_64-linux-gnu/openssl-1.0.0/engines/", O_RDONLY|O_CLOEXEC) = 6[pid  9869] open("/etc/resolv.conf", O_RDONLY) = 6 has address

You can see that while my ping looks at nsswitch.confhost does not. And they both look at /etc/resolv.conf.

We’re going to take these two .conf files in turn.

2) NSSwitch, and /etc/nsswitch.conf

We’ve established that applications can do what they like when they decide which DNS server to go to. Many apps (like ping) above can refer (depending on the implementation (*)) to NSSwitch via its config file /etc/nsswitch.conf.

(*) There’s a surprising degree of variation in
ping implementations. That’s a rabbit-hole I
didn’t want to get lost in.

NSSwitch is not just for DNS lookups. It’s also used for passwords and user lookup information (for example).

NSSwitch was originally created as part of the Solaris OS to allow applications to not have to hard-code which file or service they look these things up on, but defer them to this other configurable centralised place they didn’t have to worry about.

Here’s my nsswitch.conf:

passwd:         compat
group:          compat
shadow:         compat
gshadow:        files
hosts: files dns myhostname
networks:       files
protocols:      db files
services:       db files
ethers:         db files
rpc:            db files
netgroup:       nis

The ‘hosts’ line is the one we’re interested in. We’ve shown that ping cares about nsswitch.conf so let’s fiddle with it and see how we can mess with ping.

  • Set nsswitch.conf to only look at ‘files’

If you set the hosts line in nsswitch.conf to be ‘just’ files:

hosts: files

Then a ping to will now fail:

$ ping -c1
ping: unknown host

but localhost still works:

$ ping -c1 localhost
PING localhost ( 56(84) bytes of data.
64 bytes from localhost ( icmp_seq=1 ttl=64 time=0.039 ms

and using host still works fine:

host has address

since, as we saw, it doesn’t care about nsswitch.conf

  • Set nsswitch.conf to only look at ‘dns’

If you set the hosts line in nsswitch.conf to be ‘just’ dns:

hosts: dns

Then a ping to will now succeed again:

$ ping -c1
PING ( 56(84) bytes of data.
64 bytes from ( icmp_seq=1 ttl=63 time=8.01 ms

But localhost is not found this time:

$ ping -c1 localhost
ping: unknown host localhost

Here’s a diagram of what’s going on with NSSwitch by default wrt hosts lookup:

linux-dns-2 (1)

My default ‘hosts:‘ configuration in nsswitch.conf


3) /etc/resolv.conf

We’ve seen now that host and ping both look at this /etc/resolv.conf file.

Here’s what my /etc/resolv.conf looks like:

# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)

Ignore the first two lines – we’ll come back to those (they are significant, but you’re not ready for that ball of wool yet).

The nameserver lines specify the DNS servers to look up the host for.

If you hash out that line:


and run:

$ ping -c1
ping: unknown host

it fails, because there’s no nameserver to go to (*).

* Another rabbit hole: host appears to fall back to if there’s no nameserver specified.

This file takes other options too. For example, if you add this line to the resolv.conf file:

search com

and then ping google (sic)

$ ping google
PING ( 56(84) bytes of data.

it will try the .com domain automatically for you.

End of Part I

That’s the end of Part I. The next part will start by looking at how that resolv.conf gets created and updated.

Here’s what you covered above:

  • There’s no ‘DNS lookup’ call in the OS
  • Different programs figure out the IP of an address in different ways
    • For example, ping uses nsswitch, which in turn uses (or can use) /etc/hosts, /etc/resolv.conf and its own hostname to get the result
  • /etc/resolv.conf helps decide:
    • which addresses get called
    • which DNS server to look up

If you thought that was complicated, buckle up…
















Or you might like Docker in Practice

A Docker Image in Less Than 1000 Bytes

Here it is (base64-encoded):


How I Got There

A colleague of mine showed me a Docker image he was using to test Kubernetes clusters. It did nothing, just starts up as a pod and sits there until you kill it.

‘Look, it’s only 700kb! Really quick to download!’

This got me wondering what the smallest Docker image I could create was.

I wanted one I could base64 encode and send ‘anywhere’ with a cut and paste.

Since a Docker image is just a tar file, and a tar file is ‘just’ a file, this should be quite possible.

A Tiny Binary

The first thing I needed was a tiny Linux binary that does nothing.

There’s some prior art here, a couple of fantastic and instructive articles on creating small executables, which are well worth reading:

Smallest x86 ELF Hello World

A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux

I didn’t want a ‘Hello World’, but a program that just slept and that worked on x86_64.

I started with an example from the first article above:

	SECTION .data
msg:	db "Hi World",10
len:	equ $-msg

	SECTION .text

        global _start
	mov	edx,len
	mov	ecx,msg
	mov	ebx,1
	mov	eax,4
	int	0x80

	mov	ebx,0
	mov	eax,1
	int	0x80


nasm -f elf64 hw.asm -o hw.o
ld hw.o -o hw
strip -s hw

Produces a binary of 504 bytes.

But I don’t want a ‘hello world’.

First, I figured I didn’t need the .data or .text sections, nor did I need to load up the data. I figured the top half of the _start section was doing the printing so tried:


global _start
 mov ebx,0
 mov eax,1
 int 0x80

Which compiled at 352 bytes.

But that’s no good, because it just exits. I need it to sleep. So a little further digging and I worked out that the mov eax command loads up the CPU register with the relevant Linux syscall number, and int 0x80 makes the syscall itself call. More info on this here.

I found a list of these here. Syscall 1 is ‘exit’, so what I wanted was syscall 29: pause.

This made the program:

global _start
 mov eax, 29
 int 0x80

Which shaved 8 bytes off to compile at 344 bytes, and creates a binary that just sits there waiting for a signal, which is exactly what I want.


At this point I took out the chainsaw and started hacking away at the binary. To do this I used hexer which is essentially a vim you can use on binary files to edit the hex directly. After a lot of trial and error I got from this:

Screen Shot 2018-05-22 at 07.03.23

to this:

Screen Shot 2018-05-22 at 07.06.30

Which appeared to do the same thing. Notice how the strings are gone, as well as a lot of whitespace. Along the way I referenced this doc, but mostly it was trial and error.

That got me down to 136 bytes.

Sub-100 Bytes?

I wanted to see if I could get any smaller. Reading this suggested I could get down to 45 bytes, but alas, no. That worked for a 32-bit executable, but pulling the same stunts on a 64-bit one didn’t seem to fly at all.

The best I could do was lift a 64-bit version of the program in the above blog and sub in my syscall:

 org 0x400000
ehdr: ; Elf64_Ehdr
 db 0x7f, "ELF", 2, 1, 1, 0 ; e_ident
 times 8 db 0
 dw 2 ; e_type
 dw 0x3e ; e_machine
 dd 1 ; e_version
 dq _start ; e_entry
 dq phdr - $$ ; e_phoff
 dq 0 ; e_shoff
 dd 0 ; e_flags
 dw ehdrsize ; e_ehsize
 dw phdrsize ; e_phentsize
 dw 1 ; e_phnum
 dw 0 ; e_shentsize
 dw 0 ; e_shnum
 dw 0 ; e_shstrndx
 ehdrsize equ $ - ehdr
phdr: ; Elf64_Phdr
 dd 1 ; p_type
 dd 5 ; p_flags
 dq 0 ; p_offset
 dq $$ ; p_vaddr
 dq $$ ; p_paddr
 dq filesize ; p_filesz
 dq filesize ; p_memsz
 dq 0x1000 ; p_align
 phdrsize equ $ - phdr
 mov eax, 29
 int 0x80
filesize equ $ - $$

which gave me an image of 127 bytes.

I gave up reducing at this point, and am open to suggestions.

A Teensy Docker Image

Now I have my ‘sleep’ executable, I needed to put this in a Docker image.

To try and squeeze every byte possible, I created a binary with a filename one byte long called ‘t‘ and put it in a Dockerfile from scratch, a virtual 0-byte image:

FROM scratch
ADD t /t

Note there’s no CMD, as that increases the size of the Docker image. A command needs to be passed to the docker run command for this to run.

Using docker save to create a tar file, and then using maximum compression with gzip I got to a portable Docker image file that was less than 1000 bytes:

$ docker build -t t .
$ docker save t | gzip -9 - | wc -c

I tried in vain to reduce the size of the tar file by fiddling with the Docker manifest file, but my efforts were in vain – due to the nature of the tar file format and the gzip compression algorithm, these attempts actually made the final gzip bigger!

I also tried other compression algorithms, but gzip did best on this small file.

Can You Get This Lower?

Keen to hear from you if you can…


is here.

If you like this post, you might like Docker in Practice


Autotrace – Debug on Steroids

Autotrace is a tool that allows you to debug a process and:

  • View the output of multiple debug commands at once
  • Record this output for others to review
  • Replay the output

All of this is done in a single window.

You can think of it as a realtime sosreport or just a cool way to learn more about what’s going on when you run something.


Have you ever done something like this?

$ some_command
[1]+  Stopped                 some_command
imiell   17440 22741  0 10:47 pts/3    00:00:00 grep some_command
$ strace -p 17440 > strace.log 2>&1
$ fg
[some_command continues]

That is, you:

  • Ran a process
  • Realised you want to run some tracing on it
  • Suspended it
  • Found the pid
  • Ran your trace command(s), outputting to logfiles
  • Continued the process

Tedious, right?


autotrace can automate that. By default it finds the latest backgrounded pid, attaches to it, runs some default tracing commands on it, records the output, allows you to pause it, replay it elsewhere, and more.

Here’s what it looks like doing something similar to the above scenario:

If you remember you have autotrace before you run the command, you can specify all those commands automatically:


You can pause the session and scroll back and forth, then continue tracking. It suspends the processes while paused.

Other Features

Record and Replay

It also automatically records those sessions and can replay them – perfect for debugging or sharing debug information in real time.

Here’s the above example being tarred up and replayed somewhere else:

Zoom In and Out

You can zoom in and out by hitting the session number:

Move Windows in and Out

Got more than four commands you want to track output for? No problem, you can supply as many commands as you like.

This command runs five commands, for example:

autotrace \
  'ping' \
  'iostat 1' \
  "bash -c 'while true; do cat /proc/PID/status; sleep 2; done'" \
  'vmstat 1' 
  'bash -c "while true; do echo I was in a hidden window; sleep 1; done"'

The last is hidden at the start, but by hitting ‘m‘ you can move the panes around (the ‘main’ ping one is protected). Like this:


These examples work on Linux. To work on Mac, you need to find replacements for strace/pstree/whatever.

What does nmap get up to?

sudo autotrace 'nmap' 'strace -p PID' 'tcpdump -XXs 20000'

What about find?

sudo autotrace 'find /' 'strace -p PID'

A monster debug session on nmap involving lsof, pstree, and a tracking of /proc/interrupts (Linux only):

sudo autotrace \
    'nmap localhost' \
    'strace -p PID' \
    'tcpdump -XXs 20000' \
    'bash -c "while true;do free;sleep 5;done"' \
    'bash -c "while true;do lsof -p PID | tail -5;sleep 5;done"' \
    'bash -c "while true;do pstree -p PID | tail -5;sleep 5;done"' \
    'bash -c "while true;do cat /proc/interrupts; sleep 1;done"'



pip install autotrace



The code is available here.


It relies heavily on Thomas Ballinger’s wonderful ‘Terminal Whispering’ talk and curtsies python library.



Beyond ‘Punk Rock Git’ in Eleven Steps

Punk Rock Git?

I’ve spent the last couple of years teaching Git, mostly to users that I like to call ‘punk rock’ Git users.

They have three commands, and they can get by with them.

They’re not terribly interested in Merkle trees or SHA-1 hashes, but are interested in what a ‘detached HEAD‘ is, or understanding what a rebase is so they don’t feel intimidated when someone drops it into a conversation.

In fact, a stated primary aim of the course I wrote and run is to get you to fully understand what a rebase is. At that point, you’re a punk rocker no longer.

These git students often say they are bewildered by where to start with expanding their git knowledge. So here, I briefly outline a set of Git concepts and commands, and – crucially – the order you should grasp them in.

I also noted in bold most of the ‘a-ha’ moments I’ve observed experienced but casual users getting when they follow the course.

My book ‘Learn Git the Hard Way’ (which is the basis
of my course and has a similar structure,  and with more
optional items covered) is available as a book here.


1) Core Git Concepts

Here’s the baseline of what you want to know before you set off:

  • Understand that there are four distinct phases to git content
    • Local changes
    • Staging area
    • Committed
    • Pushed
  • Don’t memorise the above, just be aware of it
  • Branching is cheap compared to other source control tools
  • All repositories are equivalent in status – there’s no predefined client/server relationship
  • GitHub is treated as a server by convention by many users – but there’s nothing about Git that forces anything to be a ‘server’. GitHub could be used as a backup


2) Creating a Repository

It’s important to see what a git repository created from scratch looks like:

  • Creating a git repository is as simple as running git init in a folder
  • Use git add to add content
  • Use git status to see what’s going on
  • Use git commit to commit some content
  • Use git log to see what the history now says
  • Have a look at the .git folder and understand the what the HEAD file is doing


3) Cloning Repositories

You’ve created a repository, now see what happens when you clone it:

  • Use git clone to clone the repository you created above
  • Look at the .git folder and figure out its relationship to the cloned repository
  • Delete stuff ‘accidentally’ from your clone and restore using git reset


4) Branching and Tagging

Now create a branch:

  • Create a branch in a repo and make sure you’ve moved to it
  • Understand that a branch is ‘just’ a pointer to a commit that moves with each commit
  • Understand what HEAD is doing
  • Create a tag
  • Understand that a tag is a pointer to a commit
  • Understand that HEAD, branches and tags are all references


5) Merging

You’ve got branches, so learn how to merge them:

  • Create two conflicting changes on two branches
  • Try to git merge one with another
  • Resolve the conflict and git commit
  • Understand the diff format
    • There’s a great exposition here


6) Stashing

If you’re going to branch, you’ll probably want to stash:

  • Know why stashing is important/why it’s used
  • Use git stash and git stash pop
  • Understand that it’s in the git log --all output, and that it’s ‘just’ a branch, albeit a special one


7) The Reflog

Now you know what references are, and have seen the stash, the reflog should make sense quickly:

  • Use the git reflog command
  • Understand that the reflog (reference log) logs
  • Understand that this comes in handy when things go wrong in git, and references have been moved around
  • Use git reset to revert a change in the reflog using the reference id

8) Cherry Picking

Cherry picking is a nice precursor to rebasing, as they’re (in principle) similar ideas:

  • Make a change on one of the branches you’ve created
  • git checkout a change in another branch on the same repository
  • Use git log to get the commit ID of the change (which looks similar to


  • Use git cherry-pick and the ID you just found to port the same change to the other branch

9) Rebasing

Now you’re ready for rebasing!

All through the below you might want to use git log --all --graph --oneline to see the state of the repository and what’s changing:

  • Create a new branch (B) from an existing branch (A)
  • Move to that new branch (​B)
  • Create a series of changes on B
  • Move to the old branch (A)
  • Create a series of changes on A
  • Go back to branch B
  • Use git rebase to update your branch so that its changes now come from the updated end of the A branch

See how all the changes are in a line now?

  • Go back to the A branch and rebase it to B. What happened?

10) Remotes

  • In your cloned repo, look at the output of git remote -v
  • Run git branch -a and see how the remote repository is actually stored in your current repository with a remote reference.
  • Look at the .git/config file and see how the cloned repository is referenced within this repository

11) Pulling

Now you’ve understood ​remotes, you can now deconstruct git pull and understand what it does:

  • Stop using git pull!
  • Read about, use and learn git fetch and git merge
  • Understand that when you fetch, you’re talking to a different repo, and then by merging it in you’ve done a ‘pull’
  • For bonus points, grok that fast-forwards are just the HEAD moving its pointer along to the end of a series of changes without a merge


‘You Missed X!’

There’s plenty more that others might consider essential, eg:

  • Submodules
  • Bare repos
  • Advanced git logging

and so on. But the above is what I’ve found is required to get beyond punk rock git.


Related posts


You might also be interested in my book Learn Git the Hard Way, which goes into these concepts and others in a more guided way:










You might also be interested in Learn Bash the Hard Way or Docker in Practice




Get 39% off Docker in Practice with the code: 39miell2


Sandboxing Docker with Google’s gVisor



Someone pointed me at this press release from Google announcing a Docker / container sandbox for Linux.


I was intrigued enough to write a ‘quick look’ article on it here

What Does That Mean (tl;dr)?

It’s a way of achieving:

  • VM-like isolation while
  • using containers for app deployment and achieving
  • multi-tenancy, and
  • SELinux/Apparmor/Seccomp security control

What Does That Mean (Longer)?

There’s quite a few ways to limit the access a container has to the OS API. They’re listed and discussed on the gVisor Github page.

It explains what it does to the container:

gVisor intercepts all system calls made by the application, and does the necessary work to service them.

At first I thought gVisor was a ‘just’ syscall intermediary that could filter Linux API calls, but the sandboxing goes further than that:

Importantly, gVisor does not simply redirect application system calls through to the host kernel. Instead, gVisor implements most kernel primitives (signals, file systems, futexes, pipes, mm, etc.) and has complete system call handlers built on top of these primitives.

From a security perspective, this is a key quote as well:

Since gVisor is itself a user-space application, it will make some host system calls to support its operation, but much like a VMM, it will not allow the application to directly control the system calls it makes.

What really made my jaw drop was this:

[gVisor’s] Sentry [process] implements its own network stack (also written in Go) called netstack.

So it even goes to the length of not touching the host’s network stack. This reduces the attack surface of a malign container significantly.

From reading this it seems to implement most of an OS in userspace, only going to the host OS when necessary and allowed.

This is in contrast to tools like SELinux and AppArmor, which rely on host Kernel features and a bunch of root-defined constraining rules to ensure nothing bad happens.

Rule-Based-Execution (1)

SELinux is a fantastic technology that should be more used, but the reality is that it’s very hard for people to write and understand policies and feel comfortable with it since it’s so embedded in the kernel.

An alternative to achieve a similar thing might be to just use a VM:


But as the number of boxes above indicates, that’s a relatively heavy overhead to achieve isolation.

gVisor gives you the lightweight benefits of containers and the control of VMM and host-based kernel filters.


A Closer Look

I wrote a ShutIt script to create a re-usable Ubuntu VM in which I could build gVisor and run up sandboxed containers.

The build part of the script is here and the instructions to reproduce are here. If you get stuck contact me.

Here’s a video of the whole thing setting up in a VM using the above script:


gVisor is a go binary that creates a runtime environment for the container instead of runc. It consists of two processes:

In order to provide defense-in-depth and limit the host system surface, the gVisor container runtime is normally split into two separate processes. First, the Sentry process includes the kernel and is responsible for executing user code and handling system calls. Second, file system operations that extend beyond the sandbox (not internal proc or tmp files, pipes, etc.) are sent to a proxy, called a Gofer, via a 9P connection.

I didn’t know what a 9P connection is. I assume it’s something to do with the Plan9 OS, but that’s just a guess.

You might also like these posts:

Docker Security Validation
A Field Guide to Docker Security Measures
SELinux Experimentation with Reduced Pain
Unprivileged Docker Builds – A Proof of Concept

If you set the Docker daemon up according to the docs, you get a set of debug files in /tmp/runsc:

-rw-r--r-- 1 root root 6435 May 5 07:53 runsc.log.20180505-075350.302600.create
-rw-r--r-- 1 root root 1862 May 5 07:53 runsc.log.20180505-075350.337120.state
-rw-r--r-- 1 root root 3180 May 5 07:53 runsc.log.20180505-075350.346384.start
-rw-r--r-- 1 root root 1862 May 5 07:53 runsc.log.20180505-075350.529798.state
-rw-r--r-- 1 root root 32705613 May 5 08:22 runsc.log.20180505-075350.312537.gofer
-rw-r--r-- 1 root root 226843210 May 5 08:22 runsc.log.20180505-075350.319600.boot
-rw-r--r-- 1 root root 1639 May 5 08:22 runsc.log.20180505-082250.158154.kill
-rw-r--r-- 1 root root 1858 May 5 08:22 runsc.log.20180505-082250.210046.state
-rw-r--r-- 1 root root 1639 May 5 08:22 runsc.log.20180505-082250.221802.kill
-rw-r--r-- 1 root root 1600 May 5 08:22 runsc.log.20180505-082250.233557.delete

The interesting ones appear to be .gofer (which records calls made to the OS). When I noodled around, these mostly appeared to be requests to write to the docker filesystem on the host (which needs to happen when you write in the container):

D0505 07:53:50.516882 10831 x:0] Open reusing control file, mode: ReadOnly, "/var/lib/docker/overlay2/a8eadcb9a8427fa170e485f72d5aee6ee85a9c7b9176a6f01a6965f2bcd7e219/merged/bin/bash"
D0505 07:53:50.516907 10831 x:0] send [FD 3] [Tag 000001] Rlopen{QID: QID{Type: 0, Version: 0, Path: 541783}, IoUnit: 0, File: &{{38}}}

or files

D0505 07:53:50.518729 10831 x:0] send [FD 3] [Tag 000001] Rreadlink{Target: /lib/x86_64-linux-gnu/}
D0505 07:53:50.518869 10831 x:0] recv [FD 3] [Tag 000001] Twalkgetattr{FID: 1, NewFID: 15, Names: [lib]}
D0505 07:53:50.518927 10831 x:0] send [FD 3] [Tag 000001] Rwalkgetattr{Valid: AttrMask{with: Mode NLink UID GID RDev ATime MTime CTime Size Blocks}, Attr: Attr{Mode: 0o40755, UID: 0, GID: 0, NLink: 8, RDev: 0, Size: 4096, BlockSize: 4096, Blocks: 8, ATime: {Sec: 1525506464, NanoSec: 357515221}, MTime: {Sec: 1524777373, NanoSec: 0}, CTime: {Sec: 1525506464, NanoSec: 345515221}, BTime: {Sec: 0, NanoSec: 0}, Gen: 0, DataVersion: 0}, QIDs: [QID{Type: 128, Version: 0, Path: 542038}]}

The .boot file is the strace log from the container, which combined with the .gofer log can tell you what’s going on in and out of the container’s userspace.

Matching the above time of the opening of the file up I see this in the .boot log:

D0505 07:53:50.518797 10835 x:0] recv [FD 4] [Tag 000001] Rreadlink{Target: /lib/x86_64-linux-gnu/}
D0505 07:53:50.518824 10835 x:0] send [FD 4] [Tag 000001] Twalkgetattr{FID: 1, NewFID: 15, Names: [lib]}
D0505 07:53:50.519041 10835 x:0] recv [FD 4] [Tag 000001] Rwalkgetattr{Valid: AttrMask{with: Mode NLink UID GID RDev ATime MTime CTime Size Blocks}, Attr: Attr{Mode: 0o40755, UID: 0, GID: 0, NLink: 8, RDev: 0, Size: 4096, BlockSize: 4096, Blocks: 8, ATime: {Sec: 1525506464, NanoSec: 357515221}, MTime: {Sec: 1524777373, NanoSec: 0}, CTime: {Sec: 1525506464, NanoSec: 345515221}, BTime: {Sec: 0, NanoSec: 0}, Gen: 0, DataVersion: 0}, QIDs: [QID{Type: 128, Version: 0, Path: 542038}]}

Boot also has intriguing stuff like this in it:

D0505 07:53:50.514800 10835 x:0] urpc: unmarshal success.
W0505 07:53:50.514848 10835 x:0] *** SECCOMP WARNING: console is enabled: syscall filters less restrictive!
I0505 07:53:50.514881 10835 x:0] Installing seccomp filters for 63 syscalls (kill=false)
I0505 07:53:50.514890 10835 x:0] syscall filter: 0
I0505 07:53:50.514901 10835 x:0] syscall filter: 1
I0505 07:53:50.514916 10835 x:0] syscall filter: 3
I0505 07:53:50.514928 10835 x:0] syscall filter: 5
I0505 07:53:50.514952 10835 x:0] syscall filter: 7

Still Very Beta

I had lots of problems doing basic things in this sandbox, so believe them when they say this is a work in progress.

For example, I ran an apt install and got this error:

E: Can not write log (Is /dev/pts mounted?) - posix_openpt (2: No such file or directory)

which I’d never seen before.

Also, when I pinged:

root@a2e899f2e8af:/# ping
root@a2e899f2e8af:/# ping

It returned immediately but I got no output at all.

I also saw errors with simple commands when apt installing. Running these commands by hand, I got some kind of race condition that couldn’t be escaped:

root@59dc5700406d:/# /usr/sbin/groupadd -g 101 systemd-journal
groupadd: /etc/group.604: lock file already used
groupadd: cannot lock /etc/group; try again later.




Unprivileged Docker Builds – A Proof of Concept

I work at a very ‘locked-down’ enterprise, where direct access to Docker is effectively verboten.

This, fundamentally, is because access to Docker is effectively giving users root. From Docker’s own pages:

First of all, only trusted users should be allowed to control your Docker daemon.

Most home users get permissions in their account (at least in Linux) by adding themselves to the docker group, which may as well be root. In Mac, installing Docker also gives you root-like power if you know what you’re doing.

Platform Proxies

Many Docker platforms (like OpenShift) work around this by putting an API between the user and the Docker socket.

However, for untrusted users this creates a potentially painful dev experience that contrasts badly with their experience at home:

  • Push change to git repo
  • Wait for OpenShift to detect the change
  • Wait for OpenShift to pull the repo
  • Wait for OpenShift to build the image
    • The last step can take a long time if the build is not cached – which can easily happen when you have lots of build nodes and you miss the cache

vs ‘at home’

  • Hit build on your local machine
  • See if the build works

What we really want is the capability to build images when you are not root, or privileged in any way.

Thinking about it, you don’t need privileges to create a Docker image. It’s just a bunch of files in a tar file conforming to a spec. But constructing one that conforms to spec is harder than simply building a tar, as you need root-like privileges to do most useful things, like installing rpms or apt packages.


I was excited when I saw Google announced kaniko, which claimed:

…we’re excited to introduce kaniko, an open-source tool for building container images from a Dockerfile even without privileged root access.


Since it doesn’t require any special privileges or permissions, you can run kaniko in a standard Kubernetes cluster, Google Kubernetes Engine, or in any environment that can’t have access to privileges or a Docker daemon.

I took that to mean it could take a Dockerfile and produce a tar file with the image as a non-privileged user on a standard VM. Don’t you?

I also got lots of pings from people across my org asking when they could have it, which meant I had to take time out to look at it.


Pretty quickly I discovered that kaniko does nothing of the sort. You still need access to Docker to build it (which was a WTF moment). Even if you --force it not to (which you are told not to), you still can’t do anything useful without root.


I’m still not sure why Kaniko exists as a ‘new’ technology when OpenShift already allows users to build images in a controlled way.

Rootless Containers

After complaining about it on GitHub I got some good advice.

There’s a really useful page here that outlines the state of the art in rootless containers for building, shipping and running.

It’s not for the faint hearted as the range of the technology required is somewhat bewildering, but there’s an enthusiastic mini community of people all trying to make this happen.

A Proof of Concept Build

I managed to get a build of a simple yum install on a centos:7 base image built as a completely unprivileged user using Vagrant and ShutIt to automate the build.

It shows the build of this Dockerfile as an unprivileged user (person) who does not have access to the docker socket (Docker does not even need to be installed for the run – it’s only used to build the proot binary, which could be done elsewhere):

FROM centos:7
RUN yum install -y httpd
CMD echo Hello host && sleep infinity

A video of this is available here and the code for this reproducible build is here. The interesting stuff is here.


Technologies Involved

I can’t claim deep knowledge of the technologies here (so please correct me where I’m wrong/incomplete), but here’s a quick run-down of the technologies used to achieve a useful rootless build.

  • runC

This allows us to run containers that conform to the OCI spec. Docker (for example) is a superset of the OCI spec of containers.

Although we use ‘runrootless’ to run the containers, runc is still needed to back it (I think – at least I had to install it to get this to work).

  • skopeo

Skopeo gives us the ability to take an OCI image and turn it into a Docker one later. It’s also used by orca-build (below) to copy images around while it’s building from Dockerfiles.

  • umoci

umoci modifies container images. It’s also used by orca-build to unpack and repack the image created at each stage. orca-build will

  • orca-build

orca-build is a wrapper around runC, and has some support for rootless builds. It uses these technologies:

It also takes Dockerfiles as input (with a subset of Docker commands).

  • runrootless

runrootless allows us to run OCI containers as part of each stage of the build, and in turn uses proot to allow root-like commands.

  • proot

proot allows you to create a root filesystem (such as is in a Docker container) without having root privileges. I suspected that proot was root by the back door using a setuid flag, but it appears not to be the case (which is good news).

This is required to do standard build tasks like yum or apt commands.

  • User namespaces

These must be switched on in the kernel, so that the build can believe it’s root while actually being an unprivileged user from the point of view of the kernel.

This is currently switched off by default in CentOS, but is easily switched on.

Try it

A reproducible VM is available here.

The kaniko work is available here.

See also:

Jessie Frazelle’s post on building securely on K8s

Special thanks to @lordcyphar for his great work generally in this area, and specifically helping me get this working.

If you like this post, you might like  Learn Git the Hard WayLearn Bash the Hard Way or Docker in Practice



Get 39% off Docker in Practice with the code: 39miell2