Five Things I Wish I’d Known About Git

30 October 2018 by imiell

Git can be utterly bewildering to someone who uses it casually, or is not interested in things like directed acyclic graphs.

For such users, the best thing you can do is buy my book (free sample available), which guides you through the usage of git in a practical way that embeds the concepts ready for daily use.

The second best thing you can do is read on. Here I briefly go through five things I wish someone had explained to me before I started using git.

1) The Four Stages

Having come from using CVS as a source control (an older example of a Version Control System (VCS)), one of the most baffling things about git was its different approach to the state of content.

CVS had two states of data:

uncommitted
committed

and this results in these kinds of workflows:

traditional_vcs

Whereas git has four states:

Local changes
Staged/added changes
Committed
Pushed to remote

Here’s a diagram that illustrates the four stages:

1.1.3.mermaid

If, like me, you use git commit -am "checkin message" to commit your work, then the second ‘adding/staging’ state is more or less invisible to you, since the -a does it for you. It’s for this reason that I encourage new users to drop the -a flag and git add by hand, so that they understand these distinctions.

One subtlety is that the -a flag doesn’t add new files to the content tracked by git – it just adds changes made.

These states exist so that people can work independently and offline, syncing later. This was the driving force behind the development of git.

From this comes another key point: all git repositories are created equal. My clone of your repository is not dependent on yours for its existence. Each repository stands on its own, and is only related to others if you configure it so. This is another key difference between git and more traditional (nay, obsolete) client/server models of content history management.

This results in a workflow that looks more like this:

distributed_vcs

which is a far more flexible (and potentially more complicated) workflow.

2) What is a Reference?

Git docs and blogs keep talking about references, but what is a reference?

A reference is just this: a pointer to a commit. And a commit is a unique reference to a new state of the content.

Once this is understood, a few other concepts make more sense.

HEAD is a reference to ‘where you are’ in the content history. It’s the content you’re currently looking at in your git repo.

When you git commit, the HEAD moves to the new commit.

A git tag reference is one that can have arbitrary text, and does not move when a new commit is seen.

A git branch is a reference that moves with the HEAD whenever you commit a new change.

A couple of other confusing things then become clearer. For example, a detached HEAD is nothing to panic about despite its scary name – it just means that your HEAD is not pointed at a branch.

To help cement the above, look at this diagram:

$1.5.4.tex$

It represents a series of commits.

Confusingly, with git diagrams, the arrows go backwards in time. A is the first commit, then B, and so on to the latest commit (H).

There are three references – master (which is pointed at C), experimental, which is pointed at H, and HEAD, which is also pointed at H. HEAD, remember is ‘where we are’.

3) What’s a Fast-Forward?

Now that you understand what a HEAD reference is, understanding what a fast-forward is pretty simple.

Usually, when you merge two branches together, you get a new commit: $1.5.2.tex$

In the above diagram, I is a commit that represents the merging of H and G from its common ancestor (D). The changes made on both branches are applied together from D and the resulting state of the content after the commit is stored in a new state (I).

But consider the diagram we saw above:

$1.5.4.tex$

There we have two branches, but no changes were made on one of them. Let’s say we want to merge the changes on experimental (E and H) into master – we’ve experimented, and the experiment was successful.

In this case, merging E and H into master requires no changes from H, since there’s no F and G changes that need to be merged together with E and H. They are all in one line of changes.

Such a merge only requires that the master reference is picked up and moved from C to H. This is a ‘fast-forward’ – the reference just needed moving along, and no content needed to be reconciled.

4) What’s a Rebase?

My manual page for git rebase says:

Reapply commits on top of another base tip

this is much more comprehensible than previous versions of this man page, but will still confuse many people.

A visual example makes it much clearer.

Consider this example:

$2.5.3.tex$

You could merge feature1 into the master branch, and you’d end up with a new commit (G), which makes the tree look like this:

$2.5.4.tex$

You can see that you’ve retained the chronology, as both branches keep their history and order of commits.

A git rebasetakes a different approach. It ‘picks up’ the changes on our branch (commit D on feature1 in this case) and applies it to the end of the branch we are on (HEAD is at master).

$2.5.5.tex$

It’s as though we just checked out master and then made a change (D) on a new branch (feature1), rather than branched off from master some time ago at C and did our feature1 work there.

This looks a lot neater, doesn’t it? master can now be ‘fast-forwarded’ to where feature1 is by moving master‘s pointer along to D.

The downside is that we’ve lost something from the history by doing this. It doesn’t reflect the order things happened in anymore chronologically. Do you care about this?

5) The power of git log

The above concepts are all very well, but how do you grasp these in the course of your day-to-day work?

For this I highly recommend getting to grips with git’s native log command. While there are many GUIs that can display history, they all have their own opinions on how things should be displayed, and moreover are not available everywhere. As a source of truth, git log is unimpeachable and transparent.

I wrote about this in more depth here, but to give yourself a flavour, try these two commands on a repo of your choice. They cover 90% of my git log usage day-to-day:

$ git log --oneline --graph

$ git log --oneline --graph --simplify-by-decoration --all

Concepts explained here are taught in my book Learn Git the Hard Way.

learngitthehardway

Eleven bash Tips You Might Want to Know

12 October 2018 by imiell

Here are some tips that might help you be more productive with bash.

1) ^x^y^

A gem I use all the time.

Ever typed anything like this?

$ grp somestring somefile
-bash: grp: command not found

Sigh. Hit ‘up’, ‘left’ until at the ‘p’ and type ‘e’ and return.

Or do this:

$ ^rp^rep^
grep 'somestring' somefile
$

One subtlety you may want to note though is:

$ grp rp somefile
$ ^rp^rep^
$ grep rp somefile

If you wanted rep to be searched for, then you’ll need to dig into the man page and use a more powerful history command:

$ grp rp somefile
$ !!:gs/rp/rep
grep rep somefile
$

2) pushd / popd vs ‘cd -‘

This one comes in very handy for scripts, especially when operating within a loop.

Let’s say you’re in a for loop moving in and out of folders like this:

for d1 in $(ls -d */)
do
  # Store original working directory.
  original_wd="$(pwd)"
  cd "$d1"
  for d2 in $(ls -d */)
  do
    pushd "$d2"
    # Do something
    popd
  done
  # Return to original working directory
  cd "${original_wd}"
done

NOTE: I’m well aware the above code is unsafe – see here.
The code above is intended to illustrate pushd/popd without distraction
for a relative beginner.
There’s a post in the fact that people like me use $(ls -d */) all
the time without deleterious consequences 99% of the time, but
that can wait. That said, it’s well worth knowing that this
kind of issue exists in bash as it can trip you up.

You can rewrite the above using the pushd stack like this:

for d1 in $(ls -d *)
do
  pushd "$d1"
  for d2 in $(ls  -d */)
  do
    pushd "$d2"
    # Do something
    popd
  done
  popd
done

Which tracks the folders you’ve pushed and popped as you go.

Note that if there’s an error in a pushd you may lose track of the stack and popd too many time. You probably want to set -e in your script as well (see previous post)

There’s also cd -, but that doesn’t ‘stack’ – it just returns you to the previous folder:

cd ~
cd /tmp
cd blah
cd - # Back to /tmp
cd - # Back to 'blah'
cd - # Back to /tmp
cd - # Back to 'blah' ...

Material here based on material from my book
Learn Bash the Hard Way.
Free preview available here.

hero

3) `shopt` vs `set`

This one bothered me for a while.

What’s the difference between set and shopt?

sets we saw before, but shopts look very similar. Just inputting shopt shows a bunch of options:

$ shopt
cdable_vars    off
cdspell        on
checkhash      off
checkwinsize   on
cmdhist        on
compat31       off
dotglob        off

I found a set of answers here.

Essentially, it looks like it’s a consequence of bash (and other shells) being built on sh, and adding shopt as another way to set extra shell options.

But I’m still unsure… if you know the answer, let me know.

4) Here Docs and Here Strings

‘Here docs’ are files created inline in the shell.

The ‘trick’ is simple. Define a closing word, and the lines between that word and when it appears alone on a line become a file.

Type this:

$ cat > afile << SOMEENDSTRING
> here is a doc
> it has three lines
> SOMEENDSTRING alone on a line will save the doc
> SOMEENDSTRING
$ cat afile
here is a doc
it has three lines
SOMEENDSTRING alone on a line will save the doc
$

Notice that:

the string could be included in the file if it was not ‘alone’ on the line
the string SOMEENDSTRING is more normally END, but that is just convention

Lesser known is the ‘here string’:

$ cat > asd <<< 'This file has one line'

5) String Variable Manipulation

You may have written code like this before, where you use tools like sed to manipulate strings:

$ VAR='HEADERMy voice is my passwordFOOTER'
$ PASS="$(echo $VAR | sed 's/^HEADER(.*)FOOTER/1/')"
$ echo $PASS

But you may not be aware that this is possible natively in bash.

This means that you can dispense with lots of sed and awk shenanigans.

One way to rewrite the above is:

$ VAR='HEADERMy voice is my passwordFOOTER'
$ PASS="${VAR#HEADER}"
$ PASS="${PASS%FOOTER}"
$ echo $PASS

The # means ‘match and remove the following pattern from the start of the string’
The % means ‘match and remove the following pattern from the end of the string

The second method is twice as fast as the first on my machine. And (to my surprise), it was roughly the same speed as a similar python script.

If you want to use glob patterns that are greedy (see globbing here) then you double up:

$ VAR='HEADERMy voice is my passwordFOOTER'
$ echo ${VAR##HEADER*}
$ echo ${VAR%%*FOOTER}

6) Variable Defaults

These are very handy when you’re knocking up scripts quickly.

If you have a variable that’s not set, you can ‘default’ them by using this. Create a file called default.sh with these contents

#!/bin/bash
FIRST_ARG="${1:-no_first_arg}"
SECOND_ARG="${2:-no_second_arg}"
THIRD_ARG="${3:-no_third_arg}"
echo ${FIRST_ARG}
echo ${SECOND_ARG}
echo ${THIRD_ARG}

Now run chmod +x default.sh and run the script with ./default.sh first second.

Observer how the third argument’s default has been assigned, but not the first two.

You can also assign directly with ${VAR:=defaultval} (equals sign, not dash) but note that this won’t work with positional variables in scripts or functions. Try changing the above script to see how it fails.

7) Traps

The trap builtin can be used to ‘catch’ when a signal is sent to your script.

Here’s an example I use in my own cheapci script:

function cleanup() {
    rm -rf "${BUILD_DIR}"
    rm -f "${LOCK_FILE}"
    # get rid of /tmp detritus, leaving anything accessed 2 days ago+
    find "${BUILD_DIR_BASE}"/* -type d -atime +1 | rm -rf
    echo "cleanup done"                                                                                                                          
} 
trap cleanup TERM INT QUIT

Any attempt to CTRL-C, CTRL- or terminate the program using the TERM signal will result in cleanup being called first.

Be aware:

Trap logic can get very tricky (eg handling signal race conditions)
The KILL signal can’t be trapped in this way

But mostly I’ve used this for ‘cleanups’ like the above, which serve their purpose.

8) Shell Variables

It’s well worth getting to know the standard shell variables available to you. Here are some of my favourites:

RANDOM

Don’t rely on this for your cryptography stack, but you can generate random numbers eg to create temporary files in scripts:

$ echo ${RANDOM}
16313
$ # Not enough digits?
$ echo ${RANDOM}${RANDOM}
113610703
$ NEWFILE=/tmp/newfile_${RANDOM}
$ touch $NEWFILE

REPLY

No need to give a variable name for read…

$ read
my input
$ echo ${REPLY}

LINENO and SECONDS

Handy for debugging

$ echo ${LINENO}
115
$ echo ${SECONDS}; sleep 1; echo ${SECONDS}; echo $LINENO
174380
174381
116

Note that there are two ‘lines’ above, even though you used ; to separate the commands.

TMOUT

You can timeout reads, which can be really handy in some scripts

#!/bin/bash
TMOUT=5
echo You have 5 seconds to respond...
read
echo ${REPLY:-noreply}

9) Extglobs

If you’re really knee-deep in bash, then you might want to power up your globbing. You can do this by setting the extglob shell option. Here’s the setup:

shopt -s extglob
A="12345678901234567890"
B="  ${A}  "

Now see if you can figure out what each of these does:

echo "B      |${B}|"
echo "B#+( ) |${B#+( )}|"
echo "B#?( ) |${B#?( )}|"
echo "B#*( ) |${B#*( )}|"
echo "B##+( )|${B##+( )}|"
echo "B##*( )|${B##*( )}|"
echo "B##?( )|${B##?( )}|"

Now, potentially useful as it is, it’s hard to think of a situation where you’d absolutely want to do it this way. Normally you’d use a tool better suited to the task (like sed) or just drop bash and go to a ‘proper’ programming language like python.

10) Associative Arrays

Talking of moving to other languages, a rule of thumb I use is that if I need arrays then I drop bash to go to python (I even created a Docker container for a tool to help with this here).

What I didn’t know until I read up on it was that you can have associative arrays in bash.

Type this out for a demo:

$ declare -A MYAA=([one]=1 [two]=2 [three]=3)
$ MYAA[one]="1"
$ MYAA[two]="2"
$ echo $MYAA
$ echo ${MYAA[one]}
$ MYAA[one]="1"
$ WANT=two
$ echo ${MYAA[$WANT]}

Note that this is only available in bashes 4.x+.

11) source vs ‘.’

This one confused me for a long time.

You can type:

$ cat > somescript.sh << END
A=11
END
$ source somescript.sh
$ echo $A

which will run the script somescript.sh and do so while retaining the environment changes in the script in your environment.

Try this to compare:

$ cat > somescript.sh << END
A=12
END
$ chmod +x somescript.sh
$ ./somescript.sh
$ echo $A

The dot (‘.‘) command does something similar, but what’s the difference? Why does it exist?

The answer is simple: in bash they are exactly the same. The ‘.‘ was the original command, and is more portable, since it works in the sh shell as well as bash.

You may also be wondering what the difference between the dots in:

./somescript.sh

and

. ./somescript.sh

is. In the . ./somescript.sh invocation, the first dot acts as an equivalent of the source command, while the ./ after indicates that the script will be found in this folder, the dot there representing the local folder (try running cd . to see what happens).

If you didn’t use the ./, and . wasn’t in your PATH environment variable, then somescript.sh might not be found. Simple, right?

If you like this, you might like one of my books:

Learn Bash the Hard Way

Learn Git the Hard Way

Learn Terraform the Hard Way

If you liked this post, you might also like these:

Ten Things I Wish I’d Known About bash

Centralise Your Bash History

How (and Why) I Run My Own DNS Servers

My Favourite Secret Weapon – strace

A Complete Chef Infrastructure on Your Laptop

Learn Bash Debugging Techniques the Hard Way

9 October 2018 by imiell

In this article I’m going to give you a hands-on introduction to standard bash debugging techniques.

In addition, you’ll learn some techniques to make your bash scripts more robust to failure.

This article uses the hard way method, which emphasises hands-on-keyboard work to embed the learning. You’re going to have to think and type to learn.

Syntax Checking Options

Start by creating this simple script:

$ mkdir -p lbthw_debugging
$ cd lbthw_debugging
$ cat > debug_script.sh << 'END'
#!/bin/bash
A=some value
echo "${A}
echo "${B}"
END
$ chmod +x debug_script.sh

Now run it with the -n flag like this:

$ bash -n debug_script.sh -n

This flag only parses the script, rather than actually running it. It’s useful for detecting basic syntax errors.

You’ll see it’s broken. Fix it. Then run it again.

If you’re not sure how to fix, contact me.

Verbose and Trace Flags

Now run with -v to see the verbose output.

$ bash -v debug_script.sh

and then run with -x to trace the output:

$ bash -x debug_script.sh

What do you notice about the output of the commands? Read them carefully.

Do you see the problem?

Using these flags together can help debug scripts where there is an elementary error, or even just working out what’s going on when a script runs. I used -x only yesterday to figure out why a systemctl service wasn’t running or logging.

Material here based on the ‘advanced’ section of my book
Learn Bash the Hard Way.
Free preview available here.

hero

Managing Variables

Variables are a core part of most serious bash scripts (and even one-liners!), so managing them is another important way to reduce the possibility of your script breaking.

Change your script to add the ‘set’ line immediately after the first line and see what happens:

#!/bin/bash
set -o nounset
A="some value"
echo "${A}"
echo "${B}"

Now research what the nounset option does. Which set flag does this correspond to?

Now, without running it, try and figure out what this script will do. Will it run?

#!/bin/bash
set -o nounset
A="some value"
B=
echo "${A}"
echo "${B}"

I always set nounset on my scripts as a habit. It can catch many problems before they become serious.

Tracing Variables

If you are working with a particularly complex script, then you can get to the point where you are unsure what happened to a variable.

Try running this script and see what happens:

#!/bin/bash 
set -o nounset 
declare A="some value" 
function a { 
  echo "${BASH_SOURCE}>A A=${A} LINENO:${1}" 
} 
trap "a $LINENO" DEBUG 
B=value 
echo "${A}" 
A="another value" 
echo "${A}" 
echo "${B}"

There’s a problem with this code. The output is slightly wrong. Can you work out what is going on? If so, try and fix it.

You may need to refer to the bash man page, and make sure you understand quoting in bash properly.

It’s quite a tricky one to fix ‘properly’, so if you can’t fix it, or work out what’s wrong with it, then ask me directly and I will help.

Profiling Bash Scripts

Returning to the xtrace (or set -x flag), we can exploit its use of a PS variable to implement the profiling of a script:

#!/bin/bash
set -o nounset
set -o xtrace
declare A="some value"
PS4='$(date "+%s%N => ")'
B=
echo "${A}"
A="another value"
echo "${A}"
echo "${B}"
ls
pwd
curl -q bbc.co.uk

From this you should be able to tell what PS4 does. Have a play with it, and read up and experiment with the other PS variables to get familiar with what they do.

NOTE: If you are on a Mac, then you might only get
second-level granularity on the date!

Linting with Shellcheck

Finally, here is a very useful tip for understanding bash more deeply and improving any bash scripts you come across.

Shellcheck is a website and a package available on most platforms that gives you advice to help fix and improve your shell scripts. Very often, its advice has prompted me to research more deeply and understand bash better.

Here is some example output from a script I found on my laptop:

$ shellcheck shrinkpdf.sh
In shrinkpdf.sh line 44:
          -dColorImageResolution=$3             \
                                 ^-- SC2086: Double quote to prevent globbing and word splitting.
In shrinkpdf.sh line 46:
          -dGrayImageResolution=$3              \
                                ^-- SC2086: Double quote to prevent globbing and word splitting.
In shrinkpdf.sh line 48:
          -dMonoImageResolution=$3              \
                                ^-- SC2086: Double quote to prevent globbing and word splitting.
In shrinkpdf.sh line 57:
        if [ ! -f "$1" -o ! -f "$2" ]; then
                      ^-- SC2166: Prefer [ p ] || [ q ] as [ p -o q ] is not well defined.
In shrinkpdf.sh line 60:
        ISIZE="$(echo $(wc -c "$1") | cut -f1 -d\ )"
                      ^-- SC2046: Quote this to prevent word splitting.
                      ^-- SC2005: Useless echo? Instead of 'echo $(cmd)', just use 'cmd'.
In shrinkpdf.sh line 61:
        OSIZE="$(echo $(wc -c "$2") | cut -f1 -d\ )"
                      ^-- SC2046: Quote this to prevent word splitting.
                      ^-- SC2005: Useless echo? Instead of 'echo $(cmd)', just use 'cmd'.

The most common reminders are regarding potential quoting issues, but you can see other useful tips in the above output, such as preferred arguments to the test construct, and advice on “useless” echos.

Exercise

1) Find a large bash script on a social coding site such as GitHub, and run shellcheck over it. Contribute back any improvements you find.

If you like this, you might like one of my books:

Learn Git the Hard Way

Learn Terraform the Hard Way,

Learn Bash the Hard Way

Why Are Enterprises So Slow?

2 October 2018 by imiell

tl;dr

In this article I want to explain a few things about enterprises and their software, based on my experiences, and also describe what things need to be in place to make change come about. Have you ever found yourself saying things like:

Why are enterprises so slow?
How do they decide what to buy?
Why is it so hard to deliver things in an enterprise?

I worked for a large ‘enterprise’ organisation for a few years trying to deliver infrastructure software change, and found myself having to explain these things to developers who worked there, salespeople, external open source engineers, software engineers who worked for enterprise vendors, and even many, many people within that organisation. A few of those people suggested I write these explanations up so that they could pass it on to their fellow salespeople/engineers etc..

The polygon of Enterprise despair

Background

Before the enterprise, I worked for a startup that grew from a single room to 700+ people over 15 years. ‘Enterprise’ was a word often thrown at us when rejecting our software, usually in the sentence “your software isn’t enterprise enough”. I had no idea what that meant, but I have a much better idea now. It didn’t help that the people saying that were usually pretty clueless about software engineering. Like many other software developers whose experience was in an unregulated startup environment, I had little respect for the concept of enterprise software. Seems I wasn’t alone. When I finally got sick of the startup life I took a job at a huge organisation in financial services over 200 times as large. You don’t get much more ‘enterprise’ than that, but even within that context I was working in the ‘infrastructure team’, the part of the group that got beaten up for being (supposedly) slow to deliver, and then delivering less usable software than was desired. So it was like being in the enterprise, squared. Over the time that I worked there, I got a great insight into the constraints on delivery that cause client frustration to happen, and – worse luck – I was responsible for helping to deliver change within it. This is quite a long post, so I’ve broken this up into several parts to make it easier to digest:

Thought Experiment
- What would happen if an enterprise acted like a startup?
Reducing Risk
- Some ways enterprises reduce risk
- The principle underlying these methods
Cumulative Constraints
- Consequences of the culture of risk reduction
A New Hope?
- What can be done?

1) Thought Experiment

Before we start, let’s imagine a counterfactual situation – imagine an enterprise acted like a startup. Showing how this doesn’t work (and therefore why it generally doesn’t happen) will help illustrate why some of the constraints that cause the slowdowns we see in large organisations exist. First, let’s look at what a small team might do to change some software. We’ll make it a really simple example, and one you might well do routinely at home – upgrading a Linux distribution. In both cases, the relationship is:

IT person
- Manager

Here’s how the conversation might go at a really small startup:

‘Lean’ OS Upgrade – Small Company

Shall we upgrade the OS?
- Yes, ok.
Oh, I’ve hit a problem. One of the falanges have stopped working.
- OK, do some work to fix the transpondster.
Might take me a few hours
- OK
… OK, done. Can you test?
- Yup, looks good.
Great.

‘Lean’ OS Upgrade – Enterprise

Shall we upgrade the OS?
- Yes.
OK, done.
- Um, you brought down the payments system.
Whoops. I’ll roll back
- OK.
Done. We’ll look into it.
- Hi. The regulator called. They saw something on the news about payments being down. They want to know what happened.
Um, OK. I’ll write something up.
- Thanks.
- …
- They read your write-up and have asked for evidence of who decided what when. They want a timeline.
I’ll check the emails.
- By the way, you’re going to be audited in a couple of months. We’ll have to cancel all projects until then?
But we’ve got so much technical debt!
- If we don’t get this right, they’ll shut us down and we’ll be fired.
- …
- OK, we have the results of the audit.
- Audit has uncovered 59 other problems you need to solve.
OK…
- We’ll have to drop other projects, and maybe lose some people.
Um, OK…
- Oh, and my boss is being hauled in front of the regulator to justify what happened. If it doesn’t go well he’s out of a job and his boss might go to prison if they think something fishy is going on.

Now that’s a bad release… That’s a worst-case scenario, but let’s unpick what these regulated enterprises do to mitigate both the risk and the consequences of the above scenario. Specifically:

‘Who owns this?’
‘How is this maintained?’
‘Who buys it?’
‘Who’s signed off the deployment?’

2) Reducing Risk

‘Who Owns This?’ / ‘One Throat to Choke’

This is a big one. One of the most commonly-asked questions when architecting a solution within an enterprise is: ‘Who is responsible for that component/service/system?‘ In our enterprise ‘Lean OS Upgrade’ scenario above one of the first questions that will be asked is: ‘Who owns the operating system?’ That group will be identifiable through some internal system which tracks ownership of tools and technologies. Those identified as responsible will be responsible for some or all of the lifecycle management for that technology. This might include:

Upgrade management
Support (directly or via a vendor)
Security patching
Deciding who can and can’t use it
Overall policy on usage (expand/deprecate/continue usage)

This ownership results in ‘one throat to choke’ for audit functions. Much like the police will go after the drug dealer rather than the casual user, the audit functions of an enterprise will go after the formally responsible person or team than the (potentially thousands of) teams using an outdated version of a particular technology. There’s richer pickings there. From ownership comes responsibility. A lot of the political footwork in an enterprise revolves around trying to not own technologies. Who wants to be responsible for Java usage across a technology function of dozens of thousands of staff, any of whom might be doing crazy stuff? You first, mate.

Enterprises and Vendors

This also explains enterprises’ love of vendor software over pure open source. If you’ve paid someone to maintain and support a technical stack, then they become responsible for that whole stack. That doesn’t solve all your problems (you still will need to integrate their software with your IT infrastructure, and things get fuzzier the closer you look at the resulting solution), but from a governance point of view you’ve successfully passed the buck.

What is governance? IT Governance is a term that covers all the processes and structures that ensures the IT is appropriately managed in a way that satisfies those that govern the organisation. Being ‘out of governance’ (ie not conforming to standards) is considered a dangerous place to be, because you may be forced to spend money to get back ‘in’ to governance.

‘How is this maintained?’

Another aspect of managing software in an enterprise context is its maintenance. In our idealised startup above ‘Dev’ and ‘Ops’ were the same thing (ie, one person). Lo and behold you have DevOps! Unfortunately, the DevOps slogan ‘you built it, you run it’ doesn’t usually work in an Enterprise context for a few reasons. Partly it’s historical ie ‘it’s the way things have been done’ for decades, so there is a strong institutional bias towards not changing this. Jobs and heavily-invested-in processes depend on its persistence. But further bolstering this conservatism is the regulatory framework that governs how software is managed.

Regulations

Regulations are rules created by regulators, who in turn are groups of people with power ultimately derived from government or other controlling authorities. So, effectively, they have the force of law as far as your business is concerned. Regulators are not inclined to embrace fashionable new software deployment methods, and their paradigms are rooted in the experiences of software built in previous decades. What does this mean? If your software is regulated, then it’s likely that your engineering (dev) and operations teams (ops) will be separate groups of people specialising in those roles, and one of the drivers of this is the regulations, which demand a separation to ensure that changes are under some kind of control and oversight. Now, there is (arguably) a loophole here that some have exploited: regulations often talk about ‘separation of roles’ between engineering and operations, and don’t explicitly say that these roles need to be fulfilled by different people. But if you’re a really big enterprise, that might be technically correct but effectively irrelevant. Why? Because, to ‘simplify’ things, these large enterprise often create a set of rules that cover all the regulations that may ever apply to their business across all jurisdictions. And those rules are generally the strictest you can imagine. Added to that, those rules develop a life and culture of their own within the organisation independent of the regulator such that they can’t easily be brought into question.

Resistance is futile. Dev and Ops must be separate because that’s what we wrote down years ago.

So you can end up in a situation where you are forced to work in a way prescribed years ago by your internal regulations, which are in turn based on interpretations of regulations which were written years before that! And if you want to change that, it will itself likely take years and agreement from multiple parties who are unlikely to want to risk losing their job so you can deliver your app slightly faster. Obviously, this separation slows things down as engineering must make the code more tolerant to mistakes and failure so that another team can pick it up and carry it through to production. Or you just throw it over the wall and hope for the best. Either way, parties become more resistant to change.

Change Control

That’s not the only way in which the speed of change is reduced in an enterprise. In order to ensure that changes to systems can be attributed to responsible individuals, there is usually some kind of system that tracks and audits changes. One person will raise a ‘change record’, which will usually involve filling out an enormous form, and then this change must be ‘signed off’ by one or more other person to ensure that changes don’t happen without due oversight. In theory, the person signing off must carefully examine the change to ensure it is sensible and valid. In reality, most of the time trust relationships build up between change raiser and change validator which can speed things up. If the change is large and significant, then it is more likely to be closely scrutinised. There might also exist ‘standard changes’ or ‘templated changes’, which codify more routine and lower-risk updates and are pre-authorised. These must also be signed off before being deployed (usually at a higher level of responsibility, making it harder to achieve). While in theory the change can be signed off in minutes, in reality change requests can take months as obscure fields in forms are filled out wrongly (‘you put the wrong code in field 44B! Start again.’), sign-off deadlines expire, change freezes come and go, and so on. All this makes the effort of making changes far more onerous than it is elsewhere.

Security ‘Sign-Off’

If you’re working on something significant, such as a new product, or major release of a large-scale product, then it may become necessary to get what most people informally call ‘security sign-off’. Processes around this vary from place to place, but essentially, one or more security experts descend at some point on your project and audit it. I had imagined such reviews to be a very scientific process, but in reality it’s more like a medieval trial by ordeal. You get poked and prodded in various ways while questions are asked to determine weaknesses in your story. This might involve a penetration test, a look at your code and documentation, or an interview with the engineers. There will likely be references to various ‘security standards’ you may or may not have read, which in turn are enforced with differing degrees of severity. The outcome of this is usually some kind of report and a set of risks that have been identified. These risks (depending on their severity – I’ve never heard of there being none) may need to be ‘signed off’ by someone senior so that responsibility lies with them if there is a breach. That process in itself is arduous (especially when the senior doesn’t fully understand the risk) and can be repeated on a regular basis until it is sufficiently ‘mitigated’ through further engineering effort or process controls. After which it’s then re-reviewed. None of this is quick.

Summary: Corporate, not Individual Responsibility

If there’s a common thread to these factors in reducing risk, it is to shift responsibility and power from the individual to the corporate entity. If you’re a regulated, systemically-significant enterprise, then the last thing you or the public wants is for one person to wield too much power, either through knowledge of a system, or ability to alter that system in their own interests. The corollary of this is that it is very hard for one person to make change by themselves. And, as we all know, if a task is given to multiple people to achieve together, then things get complicated and change slows up pretty fast as everyone must keep each other informed as to what everyone else is doing. Once this principle of corporate responsibility is understood, then many other processes start to make sense. An example of one of these is sourcing (aka procurement: the process of buying software or other IT services).

Example – Sourcing

Working for such an enterprise, and before I stopped answering, I would get phoned up by salespeople all the time who seemed to imagine that I had a chequebook ready to sign for any technology I happened to like. The reality could not have been further from the truth. What many people don’t expect is that to prevent a situation where one person could get too much power it can be the case that technical people have no direct control over the negotiation (or ‘sourcing process’) at all. What often happens is something close to this:

You go to senior person to get sign-off for a budget for purpose X
They agree
You document at least two options for products that fulfil that purpose
The ‘sourcing team’ take that document and negotiate with the suppliers
Some magic happens
You get told which supplier ‘won’

You can see why this process helps reduce the risk that someone takes a bribe to push a particular vendor solution (there’s also often strict rules around accepting so much as a coffee from a potential supplier), which is a good thing. On the other hand, this process can take months or even years. And might need to be repeated if the process takes so long that funding has disappeared or teams have been disbanded. To complicate matters further, sourcing might have its own ‘preferred supplier lists‘ of companies that have been vetted and audited in the past. If your preferred supplier isn’t on that list (and hasn’t made a deal with one on those list), the process could take even longer.

3) Cumulative Constraints

What we have learned so far is that enterprises are fundamentally slowed down by attempts to reduce the power and individual responsibility in favour of corporate responsibility. This usually results in:

More onerous change control
Higher bars for change planning
Higher bars for buying solutions
Higher bars for security requirements
Separation of engineering and ops functions

all of which slow down delivery. It’s like entropy. You can fight it, but in the end physics wins. Now we’ll take a step outside these individual constraints to look at what happens when you structure a large scale enterprise organisation where its component groups are all fighting these same challenges.

Dependency Constraints

When you try and deliver in an enterprise you will find that your team has dependencies on other teams to provide you IT services. The classic example of this is firewall changes. You, as a developer decide – in classic agile microservices/’all the shiny’ fashion – to create a new service running on a particular port on a set of hosts. You gulp Coke Zero all night and daub the code together to get a working prototype. To allow connectivity, you need to open up some ports on the firewall. You raise a change, and discover the process involves updating a spreadsheet by hand and then raising a change request which requires at least a week’s notice. Your one night’s development is now going to elapse a week before you can try it out. And that’s hoping you filled everything out correctly and didn’t miss anything. If you did, then you have to go round again… One of the joyous things about working in an unregulated startup is that if you see a problem in one of your dependencies you have the option of taking it over and running it yourself. Don’t like your cloud provider? Switch. Think your app might work better in erlang? Rewrite. Fed up with the firewall process? Write a script to do that, and move to gitops. So why not do the same in the enterprise? Why not just ‘find your dependencies and eliminate them‘? Some do indeed take this approach, and it costs them dearly. Either they have to spend great sums of money managing the processes required to maintain and stay ‘within governance’ for the technology they’ve decided to own, or they get hit with an audit sooner or later and get found out. At that point, they might go cap in hand to the infrastructure team, whose sympathy to their plight is in proportion to the amount of funding infrastructure is being offered to solve the problems for them… The reality is that – as I said above – taking responsibility and owning a technology or layer of your stack brings with it real costs and risks that you may not be able to bear and stay in business. So however great you are as a team, you’re delivery cadence is constrained to a local maxima based on your external dependencies, which are (effectively) non-negotiable. This is a scaling up of the same constraints on individuals in favour of corporate power and responsibility. Just as it is significantly harder for you to make that much difference, it is harder for your team to make much difference, for the same structural reasons.

If you like this, you might like one of my books: Learn Git the Hard Way, Learn Terraform the Hard Way, Learn Bash the Hard Way

Cultural Constraints

Now that the ingredients for slow delivery are already there in a static, structural sense, let’s look at what happens when you ‘bake’ that structure over decades into the organisation and then try to make change within it.

Calcified Paradigms

Since reasoning about technology on a corporate scale is hard, creating change within it can only work at all if there are collective paradigms around which processes and functions can reason. These paradigms become ingrained, and surfacing and reshaping these conceptual frameworks can be an effort that must repeated over and over across an organisation if you are to successfully make change. The two big examples of this I’ve been aware of are the ‘machine paradigm’ and charging models, but one might add ‘secrets are used manually’ or many others that may also be bubbling under my conscious awareness.

The ‘machine paradigm’

Since von Neumann outlined the architecture of the computer, the view of the fundamental unit of computation as being a single discrete physical entity has held sway. Yes, you can share workloads on a single machine (mainframes still exist, for example, and two applications might use the same physical device), but for the broad mass of applications, the idea of needing a separate physical machine to run on (for performance or security reasons) has underpinned assumptions of applications’ design, build, test, and deploy phases. Recently (mostly in the last 10 years), this paradigm has been modified by virtual machines, multiples of which sit on one larger machine that runs a hypervisor. Ironically, this has reinforced the ‘machine paradigm’, since for backward compatibility each VM has all the trappings of a physical machine, such as network interfaces, mac addresses, numbers of CPUs and so on. Whether you fill out a form and wait for a physical machine or a virtual machine to be provisioned makes little difference – you’re still in the machine paradigm. Recently, aPaaSes, Kubernetes, and cloud computing have overthrown the idea that an application need sit on a ‘machine’, but the penetration of this novel (or old, if you used mainframes) idea, like the future, is unevenly distributed.

Charging models

Another paradigm that’s very hard to get traction on changing is charging models. How money moves around within an enterprise is a huge subject in itself, and has all sorts of secondary effects that are of no small interest to IT. To grossly generalise, IT is moving from a ‘capex’ model to an ‘opex’ model. Instead of buying kit and software and then running it until it wears out (capex), the ‘new’ model is to rent software and services which can be easily scaled up and down as business demand requires. Now, if you think IT in an enterprise is conservative, then prepare to deal with those that manage and handle the money! For good reason, they are as a rule very disinclined to change payment models within an organisation, since any change in process will result in bugs (old and new) being surfaced, institutional upheaval, and who knows what else. The end result is that moving to these new models can be painful. Trying to cross-charge within an organisation of any size can result in surreal conversations about ‘wooden dollars’ (ie non-existent money exchanged in lieu of real money) or services being charged out to other parts of the business, but never paid for due to conversations that may or may not have been had outside your control.

Learned Helplessness

After decades of these habits of thoughts, you end up with several consequences:

Those who don’t like the way of working leave
Those that remain calcify into whole generations of employees
Those that remain tend to prize and prefer those that agree with their views

Suggestions of change to these groups of people result in entire generations, nay armies of employees that resist change. The irony is that they are completely right. Most efforts to change do fail, and therefore most efforts to do so are wasted. The reasons are arguably circular, ie that change is resisted because it won’t work, and it won’t work because it’s resisted. But it’s also quite rational, since the reasons it won’t work are based on the external constraints that exist we have discussed above. But it’s simple game theory to follow the logic. This has previously been described as the ‘square of despair‘: square_despair

Although I’d prefer to call it the ‘polygon of despair‘, since these four are fairly arbitrary. You could add to this list, for example:

Internal charging models
Change control
Institutional inertia
Audit
Regulation
Outdated paradigms

all of which have been discussed above.

The Decagon of Despair

4) A New Hope?

Is it all a lost cause? Is there really no hope for change? Does it always end up looking like this, at best a mass of compromises that feel like failure? manifesto

Well, no. But it is bloody hard. Here’s the things I think will stack the deck in your favour:

Senior Leadership Support

I think this is the big one. If you’re looking to swim against habits of thought, then stiff resolve is required. If senior management aren’t willing to sacrifice, aren’t united in favour of it, then all sorts of primary decisions and (equally important) second-guessed decisions made by underlings from different branches of the management tree that have conflicting aims. People don’t like to talk about it, but it helps if people get fired for not constructively working with the changes. That tends to focus the mind. The classic precedent of this is point 6 of Jeff Bezos’s ‘API Mandate’: mandate

You senior leadership will also need buckets of patience as the work to do this is very front-loaded, the pain being felt far earlier benefits being felt far later than the pain.

Reduce Complexity

Talking of pain, you will do yourself favours if you fight tooth and nail to reduce complexity. This may involve taking some risks as you call out that the entire effort may be ruined by compromises that defeat the purpose, or create bureaucratic or technical quicksand that your project will flounder in later. Calling those dangers may get you a reputation, or even cost you your job. As the title of A Seat at the Table (a book I highly recommend on the subject) implies, it’s very close to a poker game.

Cross-functional Team

It might sound obvious to those that work in smaller companies, but it’s much easier to achieve change if you have a team of people that span the functions of your organisation working together. The collaboration not only benefits from seeing how things need be designed to fulfil requirements at an earlier stage, but more creative solutions are found by people who understand their function’s needs better, and the requirements of the project. If you want to go the skunkworks route, then the representatives of the other functions can tell you where your MVP shortcuts are going to bite you later on. The alternative – and this is almost invariably much, much slower – is to ‘build, then check’. So you might spent several months building your solution before you find it’s fundamentally flawed based on some corporate rule or principle that can’t be questioned.

Use Your Cynical Old Hands

The flip side of those that constitute the ‘institutional inertia’ I described above is that many of those people know the organisation inside out. These people often lose heart regarding change not because they no longer care, but because they believe that when push comes to shove the changes won’t get support. These people can be your biggest asset. The key is to persuade them that it’s possible, and that you need their help. That can be hard for both sides, as your enthusiasm for change hits their brick wall, cemented by their hard-won (or lost) experience. They may give you messages that are hard to hear about how hard it will be. But don’t underestimate the loyalty and resilience you get if they are heard.

If you liked this post, you might also like:

Five Things I Did to Change a Team’s Culture

My 20-Year Experience of Software Development Methodologies

Things I Learned Managing Site Reliability for Some of the World’s Busiest Gambling Sites

A Checklist for Docker in the Enterprise (Updated)

Or one of my books: Learn Git the Hard Way Learn Terraform the Hard Way Learn Bash the Hard Way

If you enjoyed this, then please consider buying me a coffee to encourage me to do more.

Anatomy of a Linux DNS Lookup – Part V – Two Debug Nightmares

13 September 2018 by imiell

In part V of this series of posts I take a slight detour to show you a couple of ‘debug nightmares’ that DNS threw at me.

Previous posts were:

Anatomy of a Linux DNS Lookup – Part I

Anatomy of a Linux DNS Lookup – Part II

Anatomy of a Linux DNS Lookup – Part III

Anatomy of a Linux DNS Lookup – Part IV

Both bugs threw up surprises not seen in previous posts…

Landrush and VMs

I’m a heavy Vagrant user, which I use mostly for testing (mostly) Kubernetes clusters of various kinds.

It’s key for these setups to work that DNS lookup between the various VMs works smoothly. By default, each VM is addressable only by IP address.

To solve this, I use a vagrant plugin called landrush. This works by creating a tiny ruby DNS server on the host that runs the VMs. This DNS server runs on port 10053, and keeps and returns records for the VMs that are running. So, for example, you might have two VMs running (vm1 and vm2), and landrush will ensure that your DNS lookups for these hosts (eg vm1.vagrant.test and vm2.vagrant.test) will point to the right local IP address for that VM.

It does this by creating IPTables rules on the VM host and the VMs themselves. These IPTables rules divert DNS requests to the DNS server running on port 10053, and if there’s no match, it will re-route the request to the original DNS server specified in that context.

Here’s a diagram that might help visualise this:

landrush

vagrant-landrush DNS server

Above is a diagram that represents how Landrush DNS works with Vagrant. The box represents a host that’s running two Vagrant VMs (vm1 and vm2). These have the ipaddresses 1.2.3.4 and 1.2.3.5 respectively.

A DNS request on either vm is redirected from the host’s resolver (in this case systemd-resolved) to the host’s Landrush DNS server. This is achieved using an IPTables rule on the VM.

The Landrush DNS server keeps a small database of the host mappings to IPs given out by Vagrant and responds to any requests for vm1.vagrant.test or vm2.vagrant.test with the appropriate IP local address. If the request is for another address the request is forwarded on to the host’s configured DNS server (in this case DNSMasq).

Host lookups use the same IPTables mechanism to send DNS requests to the Landrush DNS server.

The Problem

Usually I use Ubuntu 16.04 machines for this, but when I tried 18.04 machines networking was failing on them:

$ curl google.com
curl: (6) Could not resolve host: google.com

At first I assumed the VM images themselves were faulty, but taking Landrush out of the equation restored networking fully.

Trying another tack, deleting the IPTables rule on the VM meant that networking worked also. So, mysteriously, the IPTables rule was not working. I tried stracing the two curl calls (working and not-working) to see what the difference was. There was a difference, but I had no idea why it might be happening.

As a next step I tried to take systemd-resolved and Landrush out of the equation (since that was new between 16.04 and 18.04). I did this by using different IPTables rules:

Direct requests to google’s 8.8.8.8 DNS server rather than the Landrush failure (FAILED)
- Showed that Landrush wasn’t the problem
Direct /etc/resolv.conf to a different address (changed 127.0.0.53 to 9.8.7.6), and wire IPTables to Google’s DNS server (WORKED)
Direct /etc/resolv.conf to a different address (changed 127.0.0.53 to 127.0.0.54), and wire IPTables to Google’s DNS server (FAILED)
- Showed systemd-resolved not necessarily the problem

The fact that using 9.8.7.6 instead of 127.0.0.53 as a DNS server IP address led me to think that the fact that /etc/resolv.conf was pointed to a localhost address (ie one in the 127.0.0.* range) might be the problem.

A quick google led me here, which suggested that the problem was a sysctl setting:

sysctl -w net.ipv4.conf.all.route_localnet=1

And all was fixed.

The gory detail of the debugging is here.

Takeaway

sysctl settings are yet another thing that can affect and break DNS lookup!

These and more such settings are listed here.

DNSMasq, UDP=>TCP and Large DNS Responses

The second bug threw up another surprise.

We had an issue in production where DNS lookups were taking a very long time within an OpenShift Kubernetes cluster.

Strangely, it only affected some lookups and not others. Also, the time taken to do the lookup was consistent. This suggested that there was some kind of timeout on the first DNS server requested, after which it fell back to a ‘working’ one.

We did manual requests using dig to the local DNSMasq server on one of the hosts that was ‘failing’. The DNS request returned instantly, so we were scratching our heads. Then a colleague pointed out that the DNS response was rather longer than normal, which rang a bell.

Soon enough, he came back with this rfc, (RFC5966), which states:

   In the absence of EDNS0 (Extension Mechanisms for DNS 0) (see below),
   the normal behaviour of any DNS server needing to send a UDP response
   that would exceed the 512-byte limit is for the server to truncate
   the response so that it fits within that limit and then set the TC
   flag in the response header.  When the client receives such a
   response, it takes the TC flag as an indication that it should retry
   over TCP instead.

which, to summarise, means that if the DNS response is over 512 bytes, then the DNS server will send back a truncated response, and should make another request over TCP rather than UDP.

We never fixed the root cause here, but suspected that DNSMasq was not correctly returning the TCP to the client requesting. We found a setting that specified which interface DNSMasq would run against. By limiting this to one interface, requests worked again.

From this, we reasoned there was a bug in DNSMasq where if it was listening on more than one interface, and the upstream DNS request resulted in a response bigger than 512, then the response never reaches the original requester.

Takeaway

Another DNS surprise – DNS can stop working if the DNS response is over 512 bytes and the DNS client request program doesn’t handle this correctly.

Summary

DNS in Linux has even more surprises in store and things to check when things don’t go your way.

Here we saw how sysctl settings and plain old-fashioned bugs in seemingly battle-hardened code can affect your setup.

And we haven’t covered caching yet…

If you like this, you might like one of my books:

Learn Git the Hard Way, Learn Terraform the Hard Way, Learn Bash the Hard Way

If you enjoyed this, then please consider buying me a coffee to encourage me to do more.

Get 39% off Docker in Practice with the code: 39miell2

Anatomy of a Linux DNS Lookup – Part IV

6 August 2018 by imiell

In Anatomy of a Linux DNS Lookup – Part I, Part II, and Part III I covered:

nsswitch
/etc/hosts
/etc/resolv.conf
ping vs host style lookups
systemd and its networking service
ifup and ifdown
dhclient
resolvconf
NetworkManager
dnsmasq

In Part IV I’ll cover how containers do DNS. Yes, that’s not simple either…

Other posts in the series:

Anatomy of a Linux DNS Lookup – Part I

Anatomy of a Linux DNS Lookup – Part II

Anatomy of a Linux DNS Lookup – Part III

Anatomy of a Linux DNS Lookup – Part V – Two Debug Nightmares

1) Docker and DNS

In part III we looked at DNSMasq, and learned that it works by directing DNS queries to the localhost address 127.0.0.1, and a process listening on port 53 there will accept the request.

So when you run up a Docker container, on a host set up like this, what do you expect to see in its /etc/resolv.conf?

Have a think, and try and guess what it will be.

Here’s the default output if you run a default Docker setup:

$  docker run  ubuntu cat /etc/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
# 127.0.0.53 is the systemd-resolved stub resolver.
# run "systemd-resolve --status" to see details about the actual nameservers.

search home
nameserver 8.8.8.8
nameserver 8.8.4.4

Hmmm.

Where did the addresses `8.8.8.8` and `8.8.4.4` come from?

When I pondered this question, my first thought was that the container would inherit the /etc/resolv.conf settings from the host. But a little thought shows that that won’t always work.

If you have DNSmasq set up on the host, the /etc/resolv.conf file will be pointed at the 127.0.0.1 loopback address. If this were passed through to the container, the container would look up DNS addresses from within its own networking context, and there’s no DNS server available within the container context, so the DNS lookups would fail.

‘A-ha!’ you might think: we can always use the host’s DNS server by using the host’s IP address, available from within the container as the default route:

root@79a95170e679:/# ip route                                                               
default via 172.17.0.1 dev eth0                                               
172.17.0.0/16 dev eth0 proto kernel scope link src 172.17.0.2

Use the host?

From that we can work out that the ‘host’ is on the ip address: 172.17.0.1, so we could try manually pointing DNS at that using dig (you could also update the /etc/resolv.conf and then run ping, this just seems like a good time to introduce dig and its @ flag, which points the request at the ip address you specify):

root@79a95170e679:/# dig @172.17.0.1 google.com | grep -A1 ANSWER.SECTION
;; ANSWER SECTION:
google.com.             112     IN      A       172.217.23.14

However: that might work if you use DNSMasq, but if you don’t it won’t, as there’s no DNS server on the host to look up.

So Docker’s solution to this quandary is to bypass all that complexity and point your DNS lookups to Google’s DNS servers at 8.8.8.8 and 8.8.4.4, ignoring whatever the host context is.

Anecdote: This was the source of my first problem with Docker back in 2013. Our corporate network blocked access to those IP addresses, so my containers couldn’t resolve URLs.

So that’s Docker containers, but container orchestrators such as Kubernetes can do different things again…

2) Kubernetes and DNS

The unit of container deployment in Kubernetes is a Pod. A pod is a set of co-located containers that (among other things) share the same IP address.

An extra challenge with Kubernetes is to forward requests for Kubernetes services to the right resolver (eg myservice.kubernetes.io) to the private network allocated to those service addresses. These addresses are said to be on the ‘cluster domain’. This cluster domain is configurable by the administrator, so it might be cluster.local or myorg.badger depending on the configuration you set up.

In Kubernetes you have four options for configuring how DNS lookup works within your pod.

Default

This (misleadingly-named) option takes the same DNS resolution path as the host the pod runs on, as in the ‘naive’ DNS lookup described earlier. It’s misleadingly named because it’s not the default! ClusterFirst is.

If you want to override the /etc/resolv.conf entries, you can in your config for the kubelet.

ClusterFirst

ClusterFirst does selective forwarding on the DNS request. This is achieved in one of two ways based on the configuration.

In the first, older and simpler setup, a rule was followed where if the cluster domain was not found in the request, then it was forwarded to the host.

In the second, newer approach, you can configure selective forwarding on an internal DNS

Here’s what the config looks like and a diagram lifted from the Kubernetes docs which shows the flow:

apiVersion: v1
kind: ConfigMap
metadata:
  name: kube-dns
  namespace: kube-system
data:
  stubDomains: |
    {"acme.local": ["1.2.3.4"]}
  upstreamNameservers: |
    ["8.8.8.8", "8.8.4.4"]

The stubDomains entry defines specific DNS servers to use for specific domains. The upstream servers are the servers we defer to when nothing else has picked up the DNS request.

This is achieved with our old friend DNSMasq running in a pod.

kubedns

The other two options are more niche:

ClusterFirstWithHostNet

This applies if you use host network for your pods, ie you bypass the Docker networking setup to use the same network as you would directly on the host the pod is running on.

None

None does nothing to DNS but forces you to specify the DNS settings in the dnsConfig field in the pod specification.

CoreDNS Coming

And if that wasn’t enough, this is set to change again as CoreDNS comes to Kubernetes, replacing kube-dns. CoreDNS will offer a few benefits over kube-dns, being more configurabe and more efficient.

Find out more here.

If you’re interested in OpenShift networking, I wrote a post on that here. But that was for 3.6 so is likely out of date now.

End of Part IV

That’s part IV done. In it we covered.

Docker DNS lookups
Kubernetes DNS lookups
Selective forwarding (stub domains)
kube-dns

If you like this, you might like one of my books:

Learn Git the Hard Way, Learn Terraform the Hard Way, Learn Bash the Hard Way

If you enjoyed this, then please consider buying me a coffee to encourage me to do more.

Get 39% off Docker in Practice with the code: 39miell2

Anatomy of a Linux DNS Lookup – Part III

6 July 2018 by imiell

In Anatomy of a Linux DNS Lookup – Part I I covered:

nsswitch
/etc/hosts
/etc/resolv.conf
ping vs host style lookups

and in Anatomy of a Linux DNS Lookup – Part II I covered:

systemd and its networking service
ifup and ifdown
dhclient
resolvconf

and ended up here:

linux-dns-2 (2)

A (roughly) accurate map of what’s going on

Unfortunately, that’s not the end of the story. There’s still more things that can get involved. In Part III, I’m going to cover NetworkManager and dnsmasq and briefly show how they play a part.

Other posts in the series:

Anatomy of a Linux DNS Lookup – Part I

Anatomy of a Linux DNS Lookup – Part II

Anatomy of a Linux DNS Lookup – Part IV

Anatomy of a Linux DNS Lookup – Part V – Two Debug Nightmares

1) NetworkManager

As mentioned in Part II, we are now well away from POSIX standards and into Linux distribution-specific areas of DNS resolution management.

In my preferred distribution (Ubuntu), there is a service that’s available and often installed for me as a dependency of some other package I install called NetworkManager. It’s actually a service developed by RedHat in 2004 to help manage network interfaces for you.

What does this have to do with DNS? Install it to find out:

$ apt-get install -y network-manager

In my distribution, I get a config file.

$ cat /etc/NetworkManager/NetworkManager.conf
[main]
plugins=ifupdown,keyfile,ofono
dns=dnsmasq

[ifupdown]
managed=false

See that dns=dnsmasq there? That means that NetworkManager will use dnsmasq to manage DNS on the host.

2) dnsmasq

The dnsmasq program is that now-familiar thing: yet another level of indirection for /etc/resolv.conf.

Technically, dnsmasq can do a few things, but is primarily it acts as a DNS server that can cache requests to other DNS servers. It runs on port 53 (the standard DNS port), on all local network interfaces.

So where is dnsmasq running? NetworkManager is running:

$ ps -ef | grep NetworkManager
root     15048     1  0 16:39 ?        00:00:00 /usr/sbin/NetworkManager --no-daemon

But no dnsmasq process exists:

$ ps -ef | grep dnsmasq
$

Although it’s configured to be used, confusingly it’s not actually installed! So you’re going to install it.

Before you install it though, let’s check the state of /etc/resolv.conf.

$ cat /etc/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 10.0.2.2
search home

It’s not been changed by NetworkManager.

If dnsmasq is installed:

$ apt-get install -y dnsmasq

Then dnsmasq is up and running:

$ ps -ef | grep dnsmasq
dnsmasq  15286     1  0 16:54 ?        00:00:00 /usr/sbin/dnsmasq -x /var/run/dnsmasq/dnsmasq.pid -u dnsmasq -r /var/run/dnsmasq/resolv.conf -7 /etc/dnsmasq.d,.dpkg-dist,.dpkg-old,.dpkg-new --local-service --trust-anchor=.,19036,8,2,49AAC11D7B6F6446702E54A1607371607A1A41855200FD2CE1CDDE32F24E8FB5

And /etc/resolv.conf has changed again!

root@linuxdns1:~# cat /etc/resolv.conf 
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 127.0.0.1
search home

And netstat shows dnsmasq is serving on all interfaces at port 53:

$ netstat -nlp4
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address     Foreign Address State   PID/Program name
tcp        0      0 127.0.0.1:53      0.0.0.0:*       LISTEN  15286/dnsmasq 
tcp        0      0 10.0.2.15:53      0.0.0.0:*       LISTEN  15286/dnsmasq
tcp        0      0 172.28.128.11:53  0.0.0.0:*       LISTEN  15286/dnsmasq
tcp        0      0 0.0.0.0:22        0.0.0.0:*       LISTEN  1237/sshd
udp        0      0 127.0.0.1:53      0.0.0.0:*               15286/dnsmasq
udp        0      0 10.0.2.15:53      0.0.0.0:*               15286/dnsmasq  
udp        0      0 172.28.128.11:53  0.0.0.0:*               15286/dnsmasq  
udp        0      0 0.0.0.0:68        0.0.0.0:*               10758/dhclient
udp        0      0 0.0.0.0:68        0.0.0.0:*               10530/dhclient
udp        0      0 0.0.0.0:68        0.0.0.0:*               10185/dhclient

3) Unpicking dnsmasq

Now we are in a situation where all DNS queries are going to 127.0.0.1:53 and from there what happens?

We can get a clue from looking again at the /var/run folder. The resolv.conf in resolvconf has been changed to point to where dnsmasq is being served:

$ cat /var/run/resolvconf/resolv.conf 
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 127.0.0.1
search home

while there’s a new dnsmasq folder with its own resolv.conf.

$ cat /run/dnsmasq/resolv.conf 
nameserver 10.0.2.2

which has the nameserver given to us by DHCP.

We can reason about this without looking too deeply, but what if we really want to know what’s going on?

4) Debugging Dnsmasq

Frequently I’ve found myself wondering what dnsmasq’s state is. Fortunately, you can get a good amount of information out of it if you set change this line in /etc/dnsmasq.conf:

#log-queries

to:

log-queries

and restart dnsmasq

Now, if you do a simple:

$ ping -c1 bbc.co.uk

you will see something like this in /var/log/syslog (the [...] indicates that the line’s start is the same as the previous one):

Jul  3 19:56:07 ubuntu-xenial dnsmasq[15372]: query[A] bbc.co.uk from 127.0.0.1
[...] forwarded bbc.co.uk to 10.0.2.2
[...] reply bbc.co.uk is 151.101.192.81
[...] reply bbc.co.uk is 151.101.0.81
[...] reply bbc.co.uk is 151.101.64.81
[...] reply bbc.co.uk is 151.101.128.81
[...] query[PTR] 81.192.101.151.in-addr.arpa from 127.0.0.1
[...] forwarded 81.192.101.151.in-addr.arpa to 10.0.2.2
[...] reply 151.101.192.81 is NXDOMAIN

which shows what dnsmasq received, where the query was forwarded to, and what reply was received.

If the query is returned from the cache (or, more exactly, the local ‘time-to-live’ for the query has not expired), then it looks like this in the logs:

[...] query[A] bbc.co.uk from 127.0.0.1
[...] cached bbc.co.uk is 151.101.64.81
[...] cached bbc.co.uk is 151.101.128.81
[...] cached bbc.co.uk is 151.101.192.81
[...] cached bbc.co.uk is 151.101.0.81
[...] query[PTR] 81.64.101.151.in-addr.arpa from 127.0.0.1

and if you ever want to know what’s in your cache, you can provoke dnsmasq into sending it to the same log file by sending the USR1 signal to the dnsmasq process id:

$ kill -SIGUSR1 <(cat /run/dnsmasq/dnsmasq.pid)

and the output of the dump looks like this:

Jul  3 15:08:08 ubuntu-xenial dnsmasq[15697]: time 1530630488                                                                                                                                 
[...] cache size 150, 0/5 cache insertions re-used unexpired cache entries.                                                                           
[...] queries forwarded 2, queries answered locally 0                                                                                                 
[...] queries for authoritative zones 0                                                                                                               
[...] server 10.0.2.2#53: queries sent 2, retried or failed 0                                                                                         
[...] Host             Address         Flags      Expires                             
[...] linuxdns1        172.28.128.8    4FRI   H                                                                
[...] ip6-localhost    ::1             6FRI   H                                                                
[...] ip6-allhosts     ff02::3         6FRI   H                                                                
[...] ip6-localnet     fe00::          6FRI   H                                                                
[...] ip6-mcastprefix  ff00::          6FRI   H                                                                
[...] ip6-loopback     :               6F I   H                                                                
[...] ip6-allnodes     ff02:           6FRI   H                                                                
[...] bbc.co.uk        151.101.64.81   4F         Tue Jul  3 15:11:41 2018                                     
[...] bbc.co.uk        151.101.192.81  4F         Tue Jul  3 15:11:41 2018                                     
[...] bbc.co.uk        151.101.0.81    4F         Tue Jul  3 15:11:41 2018                                     
[...] bbc.co.uk        151.101.128.81  4F         Tue Jul  3 15:11:41 2018                                     
[...]                  151.101.64.81   4 R  NX    Tue Jul  3 15:34:17 2018                                     
[...] localhost        127.0.0.1       4FRI   H                                                                
[...] <Root>           19036   8   2   SF I                                                                    
[...] ip6-allrouters   ff02::2         6FRI   H

In the above output, I believe (but don’t know, and ‘?’ indicates a relatively wild guess on my part) that:

‘4’ means IPv4
‘6’ means IPv6
‘H’ means address was read from an /etc/hosts file
‘I’ ? ‘Immortal’ DNS value? (ie no time-to-live value?)
‘F’ ?
‘R’ ?
‘S’?
‘N’?
‘X’

Alternatives to dnsmasq

dnsmasq is not the only option that can be passed to dns in NetworkManager. There’s none which does nothing to /etc/resolv,conf, default, which claims to ‘update resolv.conf to reflect currently active connections’, and unbound, which communicates with the unbound service and dnssec-triggerd, which is concerned with DNS security and is not covered here.

End of Part III

That’s the end of Part III, where we covered the NetworkManager service, and its dns=dnsmasq setting.

Let’s briefly list some of the things we’ve come across so far:

nsswitch
/etc/hosts
/etc/resolv.conf
/run/resolvconf/resolv.conf
systemd and its networking service
ifup and ifdown
dhclient
resolvconf
NetworkManager
dnsmasq

If you like this, you might like one of my books:

Learn Git the Hard Way

Learn Terraform the Hard Way

Learn Bash the Hard Way

If you enjoyed this, then please consider buying me a coffee to encourage me to do more.

Get 39% off Docker in Practice with the code: 39miell2

Anatomy of a Linux DNS Lookup – Part II

18 June 2018 by imiell

In Anatomy of a Linux DNS Lookup – Part I I covered:

nsswitch
/etc/hosts
/etc/resolv.conf
ping vs host style lookups

and determined that most programs reference /etc/resolv.conf along the way to figuring out which DNS server to look up.

That stuff was more general linux behaviour (*) but here we move firmly into distribution-specific territory. I use ubuntu, but a lot of this will overlap with Debian and even CentOS-based distributions, and also differ from earlier or later Ubuntu versions.

(*) in fact, it’s subject to a POSIX standard, so
is not limited to Linux (I learned this from
a fantastic comment on the previous post)

In other words: your host is more likely to differ in its behaviour in specifics from here.

In Part II I’ll cover how resolv.conf can get updated, what happens when systemctl restart networking is run, and how dhclient gets involved.

Other posts in the series:

Anatomy of a Linux DNS Lookup – Part I

Anatomy of a Linux DNS Lookup – Part III

Anatomy of a Linux DNS Lookup – Part IV

Anatomy of a Linux DNS Lookup – Part V – Two Debug Nightmares

1) Updating /etc/resolv.conf by hand

We know that /etc/resolv.conf is (highly likely to be) referenced, so surely you can just add a nameserver to that file, and then your host will use that nameserver in addition to the others, right?

If you try that:

$ echo nameserver 10.10.10.10 >> /etc/resolv.conf

it all looks good:

# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 10.0.2.3
search home
nameserver 10.10.10.10

until the network is restarted:

$ systemctl restart networking
$ cat /etc/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 10.0.2.3
search home

our 10.10.10.10 nameserver has gone!

This is where those comments we ignored in Part I come in…

2) resolvconf

You see the phrase generated by resolvconf in the /etc/resolv.conf file above? This is our clue.

If you dig into what systemctl restart networking does, among many other things, it ends up calling a script: /etc/network/if-up.d/000resolvconf. Within this script is a call to resolvconf:

/sbin/resolvconf -a "${IFACE}.${ADDRFAM}"

A little digging through the man pages reveals that the -a flag allows us to:

Add or overwrite the record IFACE.PROG then run the update scripts
if updating is enabled.

So maybe we can call this directly to add a nameserver:

echo 'nameserver 10.10.10.10' | /sbin/resolvconf -a enp0s8.inet

Turns out we can!

$ cat /etc/resolv.conf  | grep nameserver
nameserver 10.0.2.3
nameserver 10.10.10.10

So we’re done now, right? This is how /etc/resolv.conf gets updated? Calling resolvconf adds it to a database somewhere, and then updates (if configured, whatever that means) the resolv.conf file

No.

$ systemctl restart networking
root@linuxdns1:/etc# cat /etc/resolv.conf 
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 10.0.2.3
search home

Argh! It’s gone again.

So systemctl restart networking does more than just run resolvconf. It must be getting the nameserver information from somewhere else. Where?

3) ifup/ifdown

Digging further into what systemctl restart networking does tells us a couple of things:

cat /lib/systemd/system/networking.service
[...]
[Service]
Type=oneshot
EnvironmentFile=-/etc/default/networking
ExecStartPre=-/bin/sh -c '[ "$CONFIGURE_INTERFACES" != "no" ] && [ -n "$(ifquery --read-environment --list --exclude=lo)" ] && udevadm settle'
ExecStart=/sbin/ifup -a --read-environment
ExecStop=/sbin/ifdown -a --read-environment --exclude=lo
[...]

First, the networking ‘service’ restart is actually a ‘oneshot’ script that runs these commands:

/sbin/ifdown -a --read-environment --exclude=lo
/bin/sh -c '[ "$CONFIGURE_INTERFACES" != "no" ] && [ -n "$(ifquery --read-environment --list --exclude=lo)" ] && udevadm settle'
/sbin/ifup -a --read-environment

The first line with ifdown brings down all the network interfaces (but excludes the local interface). (*)

(*) I’m unclear why this doesn’t boot me out of my
vagrant session in my example code (anyone know?).

The second line makes sure the system has done all it needs to do regarding the bringing of network interfaces down before going ahead and bringing them all back up with ifup in the third line. So the second thing we learn is that ifup and ifdown are what the networking service ‘actually’ runs.

The --read-environment flag is undocumented, and is there so that systemctl can play nice with it. A lot of people hate systemctl for this kind of thing.

Great. So what does ifup (and its twin, ifdown) do? To cut another long story short, it runs all the scripts in etc/network/if-pre-up.d/ and /etc/network/if-up.d/. These in turn might run other scripts, and so on.

One of the things it does (and I’m still not quite sure how – maybe udev is involved?) dhclient gets run.

4) dhclient

dhclient is a program that interacts with DHCP servers to negotiate the details of what IP address the network interface specified should use. It also can receive a DNS nameserver to use, which then gets placed in the /etc/resolv.conf.

Let’s cut to the chase and simulate what it does, but just on the enp0s3 interface on my example VM, having first removed the nameserver from the /etc/resolv.conf file:

$ sed -i '/nameserver.*/d' /run/resolvconf/resolv.conf
$ cat /etc/resolv.conf | grep nameserver
$ dhclient -r enp0s3 && dhclient -v enp0s3
Killed old client process
Internet Systems Consortium DHCP Client 4.3.3
Copyright 2004-2015 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/
Listening on LPF/enp0s8/08:00:27:1c:85:19
Sending on   LPF/enp0s8/08:00:27:1c:85:19
Sending on   Socket/fallback
DHCPDISCOVER on enp0s8 to 255.255.255.255 port 67 interval 3 (xid=0xf2f2513e)
DHCPREQUEST of 172.28.128.3 on enp0s8 to 255.255.255.255 port 67 (xid=0x3e51f2f2)
DHCPOFFER of 172.28.128.3 from 172.28.128.2
DHCPACK of 172.28.128.3 from 172.28.128.2
bound to 172.28.128.3 -- renewal in 519 seconds.

$ cat /etc/resolv.conf | grep nameserver
nameserver 10.0.2.3

So that’s where the nameserver comes from…

But hang on a sec – what’s that /run/resolvconf/resolv.conf doing there, when it should be /etc/resolv.conf?

Well, it turns out that /etc/resolv.conf isn’t always ‘just’ a file.

On my VM, it’s a symlink to the ‘real’ file stored in /run/resolvconf. This is a clue that the file is constructed at run time, and one of the reasons we’re told not to edit the file directly.

If the sed command above were to be run on the /etc/resolv.conf file directly then the behaviour above would be different and a warning thrown about /etc/resolv.conf not being a symlink (sed -i doesn’t handle symlinks cleverly – it just creates a fresh file).

dhclient offers the capability to override the DNS server given to you by DHCP if you dig a bit deeper into the supersede setting in /etc/dhcp/dhclient.conf…

linux-dns-2 (2)

A (roughly) accurate map of what’s going on

End of Part II

That’s the end of Part II. Believe it or not that was a somewhat simplified version of what goes on, but I tried to keep it to the important and ‘useful to know’ stuff so you wouldn’t fall asleep. Most of that detail is around the twists and turns of the scripts that actually get run.

And we’re still not done yet. Part III will look at even more layers on top of these.

Let’s briefly list some of the things we’ve come across so far:

nsswitch
/etc/hosts
/etc/resolv.conf
/run/resolvconf/resolv.conf
systemd and its networking service
ifup and ifdown
dhclient
resolvconf

If you like this, you might like one of my books:

Learn Git the Hard Way

Learn Terraform the Hard Way

Learn Bash the Hard Way

If you enjoyed this, then please consider buying me a coffee to encourage me to do more.

Get 39% off Docker in Practice with the code: 39miell2

Anatomy of a Linux DNS Lookup – Part I

8 June 2018 by imiell

Since I work a lot with clustered VMs, I’ve ended up spending a lot of time trying to figure out how DNS lookups work. I applied ‘fixes’ to my problems from StackOverflow without really understanding why they work (or don’t work) for some time.

Eventually I got fed up with this and decided to figure out how it all hangs together. I couldn’t find a complete guide for this anywhere online, and talking to colleagues they didn’t know of any (or really what happens in detail)

So I’m writing the guide myself.

Turns out there’s quite a bit in the phrase ‘Linux does a DNS lookup’…

Other posts in the series:

Anatomy of a Linux DNS Lookup – Part II

Anatomy of a Linux DNS Lookup – Part III

Anatomy of a Linux DNS Lookup – Part IV

Anatomy of a Linux DNS Lookup – Part V – Two Debug Nightmares

linux-dns-0

“How hard can it be?”

These posts are intended to break down how a program decides how it gets an IP address on a Linux host, and the components that can get involved. Without understanding how these pieces fit together, debugging and fixing problems with (for example) dnsmasq, vagrant landrush, or resolvconf can be utterly bewildering.

It’s also a valuable illustration of how something so simple can get so very complex over time. I’ve looked at over a dozen different technologies and their archaeologies so far while trying to grok what’s going on.

I even wrote some automation code to allow me to experiment in a VM. Contributions/corrections are welcome.

Note that this is not a post on ‘how DNS works’. This is about everything up to the call to the actual DNS server that’s configured on a linux host (assuming it even calls a DNS server – as you’ll see, it need not), and how it might find out which one to go to, or how it gets the IP some other way.

1) There is no such thing as a ‘DNS Lookup’ call

linux-dns-1

This is NOT how it works

The first thing to grasp is that there is no single method of getting a DNS lookup done on Linux. It’s not a core system call with a clean interface.

There is, however, a standard C library call called which many programs use: getaddrinfo. But not all applications use this!

Let’s just take two simple standard programs: ping and host:

root@linuxdns1:~# ping -c1 bbc.co.uk | head -1
PING bbc.co.uk (151.101.192.81) 56(84) bytes of data.

root@linuxdns1:~# host bbc.co.uk | head -1
bbc.co.uk has address 151.101.192.81

They both get the same result, so they must be doing the same thing, right?

Wrong.

Here’s the files that ping looks at on my host that are relevant to DNS:

root@linuxdns1:~# strace -e trace=open -f ping -c1 google.com
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
open("/lib/x86_64-linux-gnu/libcap.so.2", O_RDONLY|O_CLOEXEC) = 3
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
open("/etc/resolv.conf", O_RDONLY|O_CLOEXEC) = 4
open("/etc/resolv.conf", O_RDONLY|O_CLOEXEC) = 4
open("/etc/nsswitch.conf", O_RDONLY|O_CLOEXEC) = 4
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 4
open("/lib/x86_64-linux-gnu/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = 4
open("/etc/host.conf", O_RDONLY|O_CLOEXEC) = 4
open("/etc/hosts", O_RDONLY|O_CLOEXEC)  = 4
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 4
open("/lib/x86_64-linux-gnu/libnss_dns.so.2", O_RDONLY|O_CLOEXEC) = 4
open("/lib/x86_64-linux-gnu/libresolv.so.2", O_RDONLY|O_CLOEXEC) = 4
PING google.com (216.58.204.46) 56(84) bytes of data.
open("/etc/hosts", O_RDONLY|O_CLOEXEC)  = 4
64 bytes from lhr25s12-in-f14.1e100.net (216.58.204.46): icmp_seq=1 ttl=63 time=13.0 ms
[...]

and the same for host:

$ strace -e trace=open -f host google.com
[...]
[pid  9869] open("/usr/share/locale/en_US.UTF-8/LC_MESSAGES/libdst.cat", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid  9869] open("/usr/share/locale/en/libdst.cat", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid  9869] open("/usr/share/locale/en/LC_MESSAGES/libdst.cat", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid  9869] open("/usr/lib/ssl/openssl.cnf", O_RDONLY) = 6
[pid  9869] open("/usr/lib/x86_64-linux-gnu/openssl-1.0.0/engines/libgost.so", O_RDONLY|O_CLOEXEC) = 6[pid  9869] open("/etc/resolv.conf", O_RDONLY) = 6
google.com has address 216.58.204.46
[...]

You can see that while my ping looks at nsswitch.conf, host does not. And they both look at /etc/resolv.conf.

We’re going to take these two .conf files in turn.

2) NSSwitch, and `/etc/nsswitch.conf`

We’ve established that applications can do what they like when they decide which DNS server to go to. Many apps (like ping) above can refer (depending on the implementation (*)) to NSSwitch via its config file /etc/nsswitch.conf.

() There’s a surprising degree of variation in
ping implementations. That’s a rabbit-hole I
didn’t* want to get lost in.

NSSwitch is not just for DNS lookups. It’s also used for passwords and user lookup information (for example).

NSSwitch was originally created as part of the Solaris OS to allow applications to not have to hard-code which file or service they look these things up on, but defer them to this other configurable centralised place they didn’t have to worry about.

Here’s my nsswitch.conf:

passwd:         compat
group:          compat
shadow:         compat
gshadow:        files
hosts: files dns myhostname
networks:       files
protocols:      db files
services:       db files
ethers:         db files
rpc:            db files
netgroup:       nis

The ‘hosts’ line is the one we’re interested in. We’ve shown that ping cares about nsswitch.conf so let’s fiddle with it and see how we can mess with ping.

Set nsswitch.conf to only look at ‘files’

If you set the hosts line in nsswitch.conf to be ‘just’ files:

hosts: files

Then a ping to google.com will now fail:

$ ping -c1 google.com
ping: unknown host google.com

but localhost still works:

$ ping -c1 localhost
PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.039 ms

and using host still works fine:

$ host google.com
google.com has address 216.58.206.110

since, as we saw, it doesn’t care about nsswitch.conf

Set nsswitch.conf to only look at ‘dns’

If you set the hosts line in nsswitch.conf to be ‘just’ dns:

hosts: dns

Then a ping to google.com will now succeed again:

$ ping -c1 google.com
PING google.com (216.58.198.174) 56(84) bytes of data.
64 bytes from lhr25s10-in-f174.1e100.net (216.58.198.174): icmp_seq=1 ttl=63 time=8.01 ms

But localhost is not found this time:

$ ping -c1 localhost
ping: unknown host localhost

Here’s a diagram of what’s going on with NSSwitch by default wrt hosts lookup:

linux-dns-2 (1)

My default ‘hosts:‘ configuration in nsswitch.conf

3) `/etc/resolv.conf`

We’ve seen now that host and ping both look at this /etc/resolv.conf file.

Here’s what my /etc/resolv.conf looks like:

# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 10.0.2.3

Ignore the first two lines – we’ll come back to those (they are significant, but you’re not ready for that ball of wool yet).

The nameserver lines specify the DNS servers to look up the host for.

If you hash out that line:

#nameserver 10.0.2.3

and run:

$ ping -c1 google.com
ping: unknown host google.com

it fails, because there’s no nameserver to go to (*).

* Another rabbit hole: `host` appears to fall back to
127.0.0.1:53 if there’s no nameserver specified.

This file takes other options too. For example, if you add this line to the resolv.conf file:

search com

and then ping google (sic)

$ ping google
PING google.com (216.58.204.14) 56(84) bytes of data.

it will try the .com domain automatically for you.

End of Part I

That’s the end of Part I. The next part will start by looking at how that resolv.conf gets created and updated.

Here’s what you covered above:

There’s no ‘DNS lookup’ call in the OS
Different programs figure out the IP of an address in different ways
- For example, ping uses nsswitch, which in turn uses (or can use) /etc/hosts, /etc/resolv.conf and its own hostname to get the result
/etc/resolv.conf helps decide:
- which addresses get called
- which DNS server to look up

If you thought that was complicated, buckle up…

If you like this, you might like one of my books:

Learn Git the Hard Way

Learn Terraform the Hard Way

Learn Bash the Hard Way

If you enjoyed this, then please consider buying me a coffee to encourage me to do more.

Get 39% off Docker in Practice with the code: 39miell2

A Docker Image in Less Than 1000 Bytes

22 May 2018 by imiell

Here it is (base64-encoded):

H4sICIa2A1sCA2IA7Vrrbts2FFYL7M9+7QUGGNyfDYhtkuJFFLAhWZOhBYJmaLMOWBAEFC+xVlkyJLpYEBjdY+0l+k6jfGvqtkEWp2qD8TMg8vAqnsNzDg9lQhhmEjHDhY4zgWJBBUQJ5ZnCGAubMUQMyhJqoRRMJxYbo7Q2CedYxlQO/myqMroeEEHICIngApspxohEKI4h5DHmGEUQQw7jqAejDjBtnKz9q2w7zubi7gkugazVKHdGuWltQArkWDMCdoCqSpufg/QSPK4aV8pxW+nL96uxzMu39G+NqRe5PeekGj13Oi9BamXRmCtl1dS9X2jqel147C7W+aOJKd8dZ04dlcqsSw7KVyA9Ab/uHT/+cTht6mFRKVkMmywv0yv0mnxbMc8sSP8Apzvg0ViDtJwWxQ54Mpbny5W9qIrp2DSrmt+r+mVenu/ny+UelK6+mFR56VYtjsqfp3mxHupQZqZYdp/NGeo850x99r9j7QloyWEz8kvpK//47vuymvzQ29vf79m8MKnIaIa8bUmwRdByw6TKREIoIzE3xBrjrY7MGDUilomQ3GrNrFaIKqSZ4lkvL3tD12sn/IQCrI10xtcC7C1kH9I+xseQpYilRAwoZ5AI9IcfWFfqpRfzK1M3eeUZDRAfQDGAfc/jHTDKG1fVXiInlzcfctnwLPP9Vszs9VXvUzFy5jlZV5WzTbtN3cWkZWkhL/yS2gXm1p7lumkl24wkpv51FbYcU0EZy7SV0ucEZowkiCjvLbAVikCaGUqhyjT0c0Lj/YrElmmSWANOZ7MooHPwRCiLRaJEzBXKFGTCy49lUHNKjEigVdD6H4uTzPj9wzDCSawU0TQT2ujhjVwjgZzSj/n/eX7D/xPm/T8N/v/Ll/+Lg2fPnxw93eL85xFvyB9Rn4TzXwdAAxiMYLD/t9f/7eM/xDja1P+YBf3vKP7L2+PnttsA/IfjcQiE7nkgdH18Ey4O7pjdH7ygmX0p9n8eFA5aG3pb+0/eP/9jzFmw/13AdTBHK3/OPx7/Ic4X8qecQ9K244QG/98JXh8c/vLwwYM1/TD6KWqpv6LdOb37gT67URKterTpVxu1V9PXq3lW1d8skn++9Y83f4cDeEBAQMBnwliWuTWNu8l33G38/3X3fzGk79wFQ4S4Lwr+vwOcXIJHy4ANkLv4L4APcJ6ZSXUsz+efh1xaSOf3VxstHS6+H/nSu4s6wOns9OugxrdG7WXV5K6qc9NEn0n/ESab+s9o0P+O7v9ce1WzVNI7uAiczYI6BgQEBNwD/AvqV/+XACoAAA==

How I Got There

A colleague of mine showed me a Docker image he was using to test Kubernetes clusters. It did nothing, just starts up as a pod and sits there until you kill it.

‘Look, it’s only 700kb! Really quick to download!’

This got me wondering what the smallest Docker image I could create was.

I wanted one I could base64 encode and send ‘anywhere’ with a cut and paste.

Since a Docker image is just a tar file, and a tar file is ‘just’ a file, this should be quite possible.

A Tiny Binary

The first thing I needed was a tiny Linux binary that does nothing.

There’s some prior art here, a couple of fantastic and instructive articles on creating small executables, which are well worth reading:

Smallest x86 ELF Hello World

A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux

I didn’t want a ‘Hello World’, but a program that just slept and that worked on x86_64.

I started with an example from the first article above:

	SECTION .data
msg:	db "Hi World",10
len:	equ $-msg

	SECTION .text

        global _start
_start:
	mov	edx,len
	mov	ecx,msg
	mov	ebx,1
	mov	eax,4
	int	0x80

	mov	ebx,0
	mov	eax,1
	int	0x80

Running:

nasm -f elf64 hw.asm -o hw.o
ld hw.o -o hw
strip -s hw

Produces a binary of 504 bytes.

But I don’t want a ‘hello world’.

First, I figured I didn’t need the .data or .text sections, nor did I need to load up the data. I figured the top half of the _start section was doing the printing so tried:

global _start
_start:
 mov ebx,0
 mov eax,1
 int 0x80

Which compiled at 352 bytes.

But that’s no good, because it just exits. I need it to sleep. So a little further digging and I worked out that the mov eax command loads up the CPU register with the relevant Linux syscall number, and int 0x80 makes the syscall itself call. More info on this here.

I found a list of these here. Syscall 1 is ‘exit’, so what I wanted was syscall 29: pause.

This made the program:

global _start
_start:
 mov eax, 29
 int 0x80

Which shaved 8 bytes off to compile at 344 bytes, and creates a binary that just sits there waiting for a signal, which is exactly what I want.

Hexering

At this point I took out the chainsaw and started hacking away at the binary. To do this I used hexer which is essentially a vim you can use on binary files to edit the hex directly. After a lot of trial and error I got from this:

Screen Shot 2018-05-22 at 07.03.23

to this:

Screen Shot 2018-05-22 at 07.06.30

Which appeared to do the same thing. Notice how the strings are gone, as well as a lot of whitespace. Along the way I referenced this doc, but mostly it was trial and error.

That got me down to 136 bytes.

Sub-100 Bytes?

I wanted to see if I could get any smaller. Reading this suggested I could get down to 45 bytes, but alas, no. That worked for a 32-bit executable, but pulling the same stunts on a 64-bit one didn’t seem to fly at all.

The best I could do was lift a 64-bit version of the program in the above blog and sub in my syscall:

BITS 64
 org 0x400000
 
ehdr: ; Elf64_Ehdr
 db 0x7f, "ELF", 2, 1, 1, 0 ; e_ident
 times 8 db 0
 dw 2 ; e_type
 dw 0x3e ; e_machine
 dd 1 ; e_version
 dq _start ; e_entry
 dq phdr - $$ ; e_phoff
 dq 0 ; e_shoff
 dd 0 ; e_flags
 dw ehdrsize ; e_ehsize
 dw phdrsize ; e_phentsize
 dw 1 ; e_phnum
 dw 0 ; e_shentsize
 dw 0 ; e_shnum
 dw 0 ; e_shstrndx
 ehdrsize equ $ - ehdr
 
phdr: ; Elf64_Phdr
 dd 1 ; p_type
 dd 5 ; p_flags
 dq 0 ; p_offset
 dq $$ ; p_vaddr
 dq $$ ; p_paddr
 dq filesize ; p_filesz
 dq filesize ; p_memsz
 dq 0x1000 ; p_align
 phdrsize equ $ - phdr
 
_start:
 mov eax, 29
 int 0x80
 
filesize equ $ - $$

which gave me an image of 127 bytes.

I gave up reducing at this point, and am open to suggestions.

A Teensy Docker Image

Now I have my ‘sleep’ executable, I needed to put this in a Docker image.

To try and squeeze every byte possible, I created a binary with a filename one byte long called ‘t‘ and put it in a Dockerfile from scratch, a virtual 0-byte image:

FROM scratch
ADD t /t

Note there’s no CMD, as that increases the size of the Docker image. A command needs to be passed to the docker run command for this to run.

Using docker save to create a tar file, and then using maximum compression with gzip I got to a portable Docker image file that was less than 1000 bytes:

$ docker build -t t .
$ docker save t | gzip -9 - | wc -c
976

I tried in vain to reduce the size of the tar file by fiddling with the Docker manifest file, but my efforts were in vain – due to the nature of the tar file format and the gzip compression algorithm, these attempts actually made the final gzip bigger!

I also tried other compression algorithms, but gzip did best on this small file.

Can You Get This Lower?

Keen to hear from you if you can…

Code

is here.

If you like this post, you might like Docker in Practice

Get 39% off Docker in Practice with the code: 39miell2

Or Learn Git the Hard Way, Learn Bash the Hard Way

1) The Four Stages

2) What is a Reference?

3) What’s a Fast-Forward?

4) What’s a Rebase?

5) The power of git log

1) ^x^y^

2) pushd / popd vs ‘cd -‘

3) shopt vs set

4) Here Docs and Here Strings

5) String Variable Manipulation

6) ​Variable Defaults

7) Traps

8) Shell Variables

RANDOM

REPLY

LINENO and SECONDS

TMOUT

9) Extglobs

10) Associative Arrays

11) source vs ‘.’

Syntax Checking Options

Verbose and Trace Flags

Managing Variables

Tracing Variables

Profiling Bash Scripts

Linting with Shellcheck

Exercise

tl;dr

Background

1) Thought Experiment

‘Lean’ OS Upgrade – Small Company

‘Lean’ OS Upgrade – Enterprise

2) Reducing Risk

‘Who Owns This?’ / ‘One Throat to Choke’

Enterprises and Vendors

‘How is this maintained?’

Regulations

Change Control

Security ‘Sign-Off’

Summary: Corporate, not Individual Responsibility

Example – Sourcing

3) Cumulative Constraints

Dependency Constraints

Cultural Constraints

Calcified Paradigms

The ‘machine paradigm’

Charging models

Learned Helplessness

4) A New Hope?

Senior Leadership Support

Reduce Complexity

Cross-functional Team

Use Your Cynical Old Hands

If you liked this post, you might also like:

Landrush and VMs

vagrant-landrush DNS server

The Problem

Takeaway

DNSMasq, UDP=>TCP and Large DNS Responses

Takeaway

Summary

1) Docker and DNS

Where did the addresses 8.8.8.8 and 8.8.4.4 come from?

Use the host?

2) Kubernetes and DNS

CoreDNS Coming

End of Part IV

1) NetworkManager

2) dnsmasq

3) Unpicking dnsmasq

4) Debugging Dnsmasq

Alternatives to dnsmasq

End of Part III

(*) in fact, it’s subject to a POSIX standard, so is not limited to Linux (I learned this from a fantastic comment on the previous post)

1) Updating /etc/resolv.conf by hand

2) resolvconf

3) ifup/ifdown

(*) I’m unclear why this doesn’t boot me out of my vagrant session in my example code (anyone know?).

4) dhclient

End of Part II

3) `shopt` vs `set`

6) Variable Defaults

Where did the addresses `8.8.8.8` and `8.8.4.4` come from?

(*) in fact, it’s subject to a POSIX standard, so
is not limited to Linux (I learned this from
a fantastic comment on the previous post)

(*) I’m unclear why this doesn’t boot me out of my
vagrant session in my example code (anyone know?).

2) NSSwitch, and `/etc/nsswitch.conf`

() There’s a surprising degree of variation in
ping implementations. That’s a rabbit-hole I
didn’t* want to get lost in.

Set `nsswitch.conf` to only look at ‘files’

Set `nsswitch.conf` to only look at ‘dns’

3) `/etc/resolv.conf`

* Another rabbit hole: `host` appears to fall back to
127.0.0.1:53 if there’s no nameserver specified.