Surgically Busting the Docker Cache

What is ‘Busting the Cache’?

If you’ve ever spent any time building Docker images, you will know that Docker caches layers as they are built, and as long as those lines don’t change, Docker treats the outputted layer is identical

There’s a problem here. If you go to the network to pick up an artefact, for example with:

RUN curl https://myartefactserver.local/myjar.jar > myjar.jar

then Docker will treat that command as cache-able, even if the artefact has changed.

Solution 1: –no-cache

The sledgehammer solution to this is to add a --no-cache flag to your build. This removes the caching behaviour, meaning your build will run fully every time, no matter whether the lines of your Dockerfile change or not.

Problem solved? Well… not really. If your build is installing a bunch of other more stable artefacts, like this:

FROM ubuntu
RUN apt-get update -y && apt-get install -y many packages you want to install
# ...
# more commands
# ...
RUN curl https://myartefactserver.local/myjar.jar > myjar.jar
CMD ./run.sh

Then every time you want to do a build, the cycle time is slow as you wait for the image to fully rebuild. This can get very tedious.

Solution 2: Manually Change the Line

You can get round this problem by dropping the --no-cache flag and manually changing the line every time you build. Open up your editor, and change the line like this:

RUN [command]  # sdfjasdgjhadfa

Then the build will But this can get tedious.

Solution 3: Automate the Line Change

But this can get tedious too. So here’s a one-liner that you can put in an alias, or your makefile to ensure the cache is busted at the right point.

First change the line to this:

RUN [command] # bustcache: 

and then change your build command to:

perl -p -i -e "s/(.bustcache:).*/\1 $RANDOM/" Dockerfile && docker build -t tag .

The perl command will ensure that the line is changed to a random number generated by the shell.

There’s a 1/100,000 chance that the number will repeat itself in two runs, but I’m going to ignore that…


If you like this, you might like one of my books:
Learn Bash the Hard Way

Learn Git the Hard Way
Learn Terraform the Hard Way

LearnGitBashandTerraformtheHardWay

Get 39% off Docker in Practice with the code: 39miell2


Advertisements

Software Security Field Guide for the Bewildered

If you have worked your way in software for a number of years and you’re not a security specialist, you might be occasionally confronted by someone from ‘security’ who generally says ‘no’ to things you deliver.

For a long time I was in this position and was pretty bewildered by how to interpret what they were saying, or understand how they thought.

Without being trained or working in the field, it can be difficult to discern the underlying principles and distinctions that mark out a security magus from a muggle.

While it’s easy to find specific advice on what practices to avoid

…if you’ve ever been locked in a battle with a security consultant to get something accepted then it can be hard to figure out what rules they are working to.

So here I try and help out anyone in a similar position by attempting to lay out clearly (for the layperson) some of the principles (starting with the big ones) of security analysis before moving onto more detailed matters of definition and technology.

Principles

‘There’s no such thing as a secure system’

The broadest thing to point out that is not immediately obvious to everyone is that security is not a science, it’s an art. There is no such thing as a secure system, so to ask a security consultant ‘is that secure?’ is to invite them to think of you as naive.

Any system that contains information that is in any way private is vulnerable, whether to a simple social engineering attack, or a state-funded attempt to infiltrate your systems that uses multiple ways to attack your system. What security consultants generally try to do is establish both where these weaknesses may be, and how concerned to be about them.

IT Security Is An Art, Not A Science

This makes IT security an art, not a science, which took me some time to catch onto. There’s usually no magic answer to getting your design accepted, and often you can get to a position where some kind of tradeoff between security and risk is evaluated, and may get you to acceptance.

Anecdote: I was once in a position where a ‘secrets store’ that used base64 encoding was deemed acceptable for an aPaaS platform because the number of users was deemed low enough for the risk to be acceptable. A marker was put down to review that stance after some time, in case the usage of the platform spread, and a risk item added to ensure that encryption at rest was addressed by no later than two years.

A corollary of security being an art is that ‘layer 8’ of the stack (politics and religion) can get in the way of your design, especially if it’s in any way novel. Security processes tend to be an accretion of: specific directions derived from regulations; the vestigal scars of past breaches; personal prejudice; and plain superstition.

Trust Has to Begin Somewhere

Often when you are discussing security with people you get into situations where you get into a ‘turtles all the way down’ scenario, where you wonder how anything can be done because nothing is ever trusted.

Anecdote: I have witnessed a discussion with a (junior) security consultant where a demand was made to encrypt a public key, based on a general injunction that ‘all data must be encrypted’. ‘Using what key?’ was the natural question, but an answer was not forthcoming…

The plain fact is that everyone has to trust something at some point in order to move information around anything. Examples of things you might (or might not) trust are:

  • The veracity of the output of dmesg on a Linux VM
  • The Chef server keys stored on your hardened VM image
  • Responses to calls to the metadata IP address when running on AWS (viz: http://169.254.169.254)
  • That Alice in Accounts will not publish her password on Twitter
  • That whatever is in RAM has not been tampered with or stolen
  • The root public keys shipped with your browser

Determine Your Points of Trust

Very often determining what you are allowed to trust is the key to unlocking various security conundrums when designing systems. When you find a point of trust, exploit it (in a good way) as much as you can in your designs. If you’ve created a new point of trust as part of your designs, then prepare to be challenged.

Responsibility Has to End Somewhere

When you trust something, usually someone or something must be held responsible when it fails to honour that trust. If Alice publishes her password on Twitter, and the company accounts are leaked to the press, then Alice is held responsible for that failure of trust. Establishing and making clear where the trust failure would lie in the event of a failure of trust is also a way of getting your design accepted in the real world.

Determining what an acceptable level of trust to place in Alice will depend on what her password gives her access to. Often there are data classification levels which determine minimum requirements before trust can be given for access to that data. At the extreme end of “secret”, root private keys can be subject to complex ceremonies that attempt to ensure that no one person can hijack the process for their own ends.

Consequences of Failure Determines Level of Paranoia

Another principle that follows from the ‘security is an art, not a science’ principle is that the extent to which you fret about security will depend on the consequences of failure. The loss of a password that allows someone to read some publicly-available data stored on a company server will not in itself demand much scrutiny from security.

The loss of a root private key, however, is about as bad as it can get from a security standpoint, as that can potentially give access to all data across the entire domain of that key hierarchy.

If you want to reduce the level of scrutiny your design gets put under, reduce the consequences of a breach.


Learn Bash the Hard Way

Learn Git the Hard Way

Learn Terraform the Hard Way


Key Distinctions

If you want to keep pace with a security consultant as they explain their concerns to you, then there are certain key distinctions that they may frequently refer to, and assume you understand.

Getting these distinctions and concepts under your belt will help you convince the security folks that you know what you’re doing.

Encryption vs Encoding

This is a 101 distinction you should grasp.

Encoding is converting some data into some other format. Anyone who understands the encoding can convert the data back into readable form. ASCII and UTF-8 are examples of encodings that convert numbers into characters. If you give someone some encoded data, it won’t take them long to figure out what the data is, unless the encoding is extremely complex or obscure.

Encryption involves needing some secret or secure process to get access to the data, like a private ‘key’ that you store in your ~/.ssh folder. A key is just a number that’s very difficult to guess, like your house key’s (probably) unique shape. Without access to that secret key, you can’t work out what that data is without a lot of resources (sometimes more than the all the world’s current computing power) to overcome the mathematical challenge.

Hashing vs Encryption

Hashing and encryption may be easily confused also. Hashing is the process of turning one set of data into another through a reproducible algorithm. The key point about hashing is that the data goes one-way. If you have the hash value (say, ae5690f1aff) then you can’t easily reverse that to the original

Hashing has a weakness. Let’s say you ‘md5sum’ an insecure password like password. You will always get the value: 5f4dcc3b5aa765d61d8327deb882cf99&oq=5f4dcc3b5aa765d61d8327deb882cf99

from the hash.

If you store that hashed password in a database, then anyone can google it to find out what your password really is, even though it’s a hash. Try it with other commonly-used passwords to see what happens.

This is why it’s important to ‘salt‘ your hash with a secret key so that knowledge of the hash algorithm isn’t enough to crack a lot of passwords.

Authentication vs Authorization

Sometimes shortened to ‘authn‘ and ‘authz‘, this distinction is another standard one that gets slipped into security discussions.

Authentication

Authentication is the process of determining what your identity is. The one we’re all familiar with is photo id. You have a document with a name and a photo on it that’s hard to fake (and therefore ‘trusted’), and when asked to prove who you are you produce this document and it’s examined before law enforcement or customs accepts your claimed identity.

There have been many interesting ways to identify authenticity of identity. My favourite is the scene in Big where the Tom Hanks character has to persuade his friend that he is who he says he is, even though he’s trapped in the body of a man:

Shared Secret Authentication

To achieve this he uses a shared secret: a song (and associated dance data) that only they both know. Of course it’s possible that the song was overheard or some government agency had listened in to their conversations for years to fake the authentication, but the chances of this are minimal, and would raise the question of: why would they bother?

What would justify that level of resources just to trick a boy into believing something so ludicrous? This is another key question that can be asked when evaluating the security of a design.

The other example I like is the classic spy trope of using two halves of a torn postcard, giving one half to each side of a communication, making a ‘symmetric key’ that is difficult to forge unless you have access to one side of it:

Symmetric Key Encryption

Symmetric vs Asymmetric Keys

This also exemplifies nicely what a symmetric key is. It’s a key that is ‘the same’ one used on both sides of the communication. A torn postcard is not ‘the same’ on both sides, but it can be argued that if you have one part of it, it’s relatively easy to fake the other. This could be complicated if the back of the postcard had some other message known only to both sides written on it. Such a message would be harder to fake since you’d have to know the message in both people’s minds.

An asymmetric key is one where access to the key used to encrypt the message does not imply access to decrypt the message. Public key encryption is an example of this: anyone can encrypt a message with the public key, but the private key is kept secret by the receiver. Anyone can know the public key (and write a message using it), but only the holder of the private key can read the message.

No authentication process is completely secure (remember, nothing is secure, right?), but you can say that you have prohibitively raised the cost of cheating security by demanding evidence of authenticity (such as a passport or a driver’s license) that is costly to fake, to the point where it’s reasonable to say acceptably few parties would bother.

If the identification object itself contains no information (like a bearer token), then there is an additional level of security through as you have to both own the objects, and know what it’s for. So even if the key is lost, more has to happen before there is a compromise of the system.

Authorization

Authorization is the process of determining whether you are allowed to do something or not. While authentication is a binary fact about one piece of information (you are either who you say you are, or you are not), authorization will depend on both who you are and what you are asking to do.

In other words: Dave is still Dave. But Dave can’t open the bay doors anymore. Sorry Dave.

Concepts

RBAC

Following on from Authentication and Authorization, Role-Based Access Control gives permission to a more abstract entity called a role.

Rather than giving access to that user directly, you give the user access to the role, and then that role has the access permissions set for it. This abstraction allows you to manage large sets of users more easily. If you have thousands of users that have access to the same role, then changing that role is easier than going through thousands of users one-by-one and changing their permissions.

To take a concrete example, you might think of a police officer as having access to the ‘police officer’ role in society, and has permission to stop someone acting suspiciously in addition to their ‘civilian’ role permissions. If they quit, that role is taken away from them, but they’re still the same person.

Security Through Obscurity

Security through obscurity is security through the design of a system. In other words, if the design of your system were to become public then it would be easy to expose.

Placing your house key under a plant next to the door, or under the doormat would be the classic example. Anyone aware of this security ‘design’ (keeping the key in some easy-to-remember place near the door) would have no trouble breaking into that house.

By contrast, the fact that you know that I use public key encryption for my ssh connections, and even the specifics of the algorithms and ciphers used in those communications does not give you any advantage in breaking in. The security of the system depends on maths, specifically the difficulty in factoring a specific class of large numbers.

If there are weaknesses in these algorithms then they’re not publicly known. That doesn’t preclude the possibility that someone, somewhere can break them (state security agencies are often well ahead of their time in cryptography, and don’t share their knowledge, for obvious reasons).

‘Anybody wanna shut down the Federal Reserve?’

It’s a cliche to say that security through obscurity is bad, but it can be quite effective at slowing an attacker down. What’s bad about it is when you depend on security through obscurity for the integrity of your system.

An example of security through obscurity being ‘acceptable’ might be if you run an ssh server on (say) port 8732 rather than 22. You depend on ssh security, but the security through obscurity of running on a non-standard port prevents casual attackers from ‘seeing’ that your port 22 is open, and as a secondary effect also can prevent your ssh logs from getting overloaded (perhaps exposing to other kinds of attack). But any cracker worth her salt wouldn’t be put off by this security measure alone.

If you really want to impress your security consultant, then casually mention Kerckhoffs Principle which is a more formal way of saying ‘security through obscurity is not sufficient’.

Principle of Least Privilege

The principle of least privilege states that any process, user or program has only the privileges it needs to do its job.

Authentication works the same way, but authorization is only allowed for a minimal set of functions. This reduces the blast radius of compromise.

Blast radius is a metaphor from nuclear weapons technology.
IT people use it in various contexts to make what they do sound significant.

A simple example might be a process that starts as root (because it might need access to a low-numbered port, like an http server), but then drops down. This ensures that if the server is compromised after that initial startup then the consequences would be far less than before. It is then up for debate whether that level of security is sufficient.

Anecdote: I once worked somewhere where the standard http server had this temporary root access removed. Users had to run on a higher-numbered port and low-numbered ports were run on more restricted servers.

In certain NSA-type situations, you can even get data stores that users can write to, but not read back! For example, if a junior security agent submits a report to a senior, they then get no access to that document once submitted. This gives the junior the minimal level of privilege they need to do their job. If they could read the data back, then that increases the risk of compromise as the data would potentially be in multiple places instead of just one.

Blast Radius

There are other ways of reducing the blast radius of compromise. One way is to use tokens for authentication and authorization that have very limited scope.

At an extreme, an admin user of a server might receive a token to log into it (from a highly secured ‘login server’) that:

  • can only be used once
  • limits the session to two minutes
  • expires in five minutes
  • can only perform a very limited action (eg change a single file)
  • can only be used from a specific subnet

If that token is somehow lost (or copied) in transit then it could only be used before it’s used (within five minutes) by the intended recipient for a maximum of two minutes, and the damage should be limited to a specific file if (and only if) the user misusing the token already has access to the specified network.

By limiting the privileges and access that that token has the cost of failure is far reduced. Of course, this focusses a large amount of risk onto the login server. If the login server itself were compromised then the blast radius would be huge, but it’s often easier for organisations to manage that risk centrally as a single cost rather than spreading it across a wide set of systems. In the end, you’ve got to trust something.

Features like these are available in Hashicorp’s Vault product, which centralise secrets management with open source code. It’s the most well-known, but other products are available.

N-Factor Authentication

You might have noticed in the ‘Too Many Secrets’ clip from the film Sneakers above that access to all the systems was granted simply by being able to decrypt the communications. You could call this one-factor authentication, since it was assumed that the identity of the user was ‘admin’ just by virtue of having the key to the system.

Of course, in the real world that situation would not exist today. I would hope that the Federal Reserve money transfer system would at least have a login screen as well before you identify yourself as someone that can move funds arbitrarily around the world.

A login page can also be regarded as one-factor authentication, as the password (or token) is the only secret piece of information required to prove authenticity.

Multi-factor authentication makes sure that the loss of one piece of authentication information is not sufficient to get access to the system. You might need a password (something you know), and a secret pin (another thing you have), and a number generated by your mobile phone, and a fingerprint, and the name of your first pet. That would be 5-factor encryption.

Of course, all this is undermined if the recovery process sends a link to an authentication reset to an email address that isn’t secured so well secured. All it takes then is for an attacker to compromise your email, and then tell the system that you’ve lost your login credentials. If your email is zero- or one-factor authentication than the system is only as secure as that and all the work to make it multi-factor has been wasted.

This is why get those ‘recovery questions’ that supposedly only you know (name of your first pet). Then, when people forget those, you get other recovery processes, like sending a letter to your home with a one-time password on it (which of course means trusting the postal service end-to-end), or an SMS (which means trusting the network carrier’s security). Once again, it’s ‘things you can trust’ all the way down.

So it goes.

Acceptable Risk and Isolation

We’ve touched on this already above when discussing the ‘prohibitive cost of compromising a system’ and the ‘consequences of a breach’, but it’s worth making explicit the concept of ‘acceptable risk’. An acceptable risk is a risk that is known about, but whose consequences of compromise are less than the effort of

A sensible organisation concerned about security in the real world will have provisions for these situations in their security standards, as it could potentially save a lot of effectively pointless effort at the company level.

For example, a username/password combination may be sufficient to secure an internal hotel booking system. Even if that system were compromised, then (it might be argued) you would still need to compromise the credit card system to exploit it for material gain.

The security consultant may raise another factor at this point, specifically: whether the system is appropriately isolated. If your hotel booking system sits on the same server as your core transaction system, then an exploit of the book system could result in the compromise of your core transaction system.

Sometimes, asking a security consultant “is that an acceptable risk?” can yield surprising results, since they may be so locked into saying ‘no’ that they may have overlooked the possibility that the security standards they’re working to do indeed allow for a more ‘risk-based’ approach.

Conclusion

That was a pretty quick tour through a lot of security concepts that will hopefully help you if you are bewildered by security conversations.

If I missed anything out, please let me know: @ianmiell on twitter.

The Lazy Person’s Guide to the Info Command

Most people who use Linux pretty quickly learn about man pages, and how to navigate them with their preferred pager (usually less these days).

Less well known are the info pages. If you’ve never come across them, these look like man pages, and contain similar information, but are invoked like this:

info grep

Over the past couple of decades I often found myself looking at an info page and wondering how to navigate it, hitting various keys and getting lost and frustrated.

What Do I Do Now?

I tried man info, but that didn’t tell me how to navigate the pages. More rarely I would try info info, but didn’t have the time or patience to do follow the tutorial there and then as I was busy trying to get some information, stat.

The other day I finally had enough and decided to take the time to sit down and learn it properly. It didn’t take that long, but I figured there was a case for writing down a helpful guide for new users that just want to get going.

The Bare Minimum

Here’s the bare minimum you need to read through an info page without ever getting lost:

  • ] – next page
  • [ – previous page
  • space – page down within page
  • b – page up within page
  • q – quit

If you want to get commands into your muscle memory as fast as possible, focus on these. It won’t get you round pages efficiently, but you won’t wonder how to get back to where you were, or how you got where you are. If you’re a very casual user, stop here and come back later when you get fed up of spinning forwards and backwards through pages to find something.

Try it with something like info sed.

Levelling Up

If you want to get to the next level with info, then these commands will help:

  • n – next page in this level
  • p – previous page in this level
  • return – jump to page ‘lower down’
  • l – go back to the last node seen
  • u – go ‘up’ a level

info has a hierarchical structure. There is a top-level page, and then ‘child’ pages that can have other pages at the same ‘level’. To go to the next page at the same level you can hit the n key. To go back to the previous page at the same level you hit p.

Occasionally you will get an item that allows you ‘jump down’ a level by hitting the return key. For example, by placing the cursor on the ‘Definitions’ line below and hitting return you will be taken to

* Introduction::                An introduction to the shell.
* Definitions::                 Some definitions used.

To return to the page you were last on at any point, you can hit l (for ‘last page’) and you will be returned to the top of that page. Or if you want to go ‘up’ a level, type u.

Still Interested?

If you’re still interested then you might want to read through info info carefully, but before you do here’s a couple of final tips to help avoid getting lost in that set of pages (which I have done more than once).

First, when you get stuck or want to dig in further, you can get help:

  • ? – show the info commands window
  • h – open the general help window

Confusingly, these options opens up a half-window that, in the case of h at least, gives no indication of how to close it down again. Here’s how:

  • C-x 0 – close the window

Hitting CTRL and x together, followed by 0 gets you out.

Why Bother?

You might wonder what the point of learning to read info pages is.

For me, the main reasons are:

  • They are often far more detailed (and more structured) than man pages
  • They are more definitive and complete. The grep info page, for example, contains a great set of examples, a discussion on performance, and an introduction to regular expressions. In fact, they’re intended to be mini books that can be printed off when converted to the appropriate format
  • You can irritate and/or intimidate colleagues by dismissing man page usage as ‘inferior’ and asserting that real engineers use info (joke)

Aside from anything else, I find getting fluent with these pieces of relative arcana satisfying. Maybe it’s just me.


Learn Bash the Hard Way

Learn Git the Hard Way

Learn Terraform the Hard Way


Get 39% off Docker in Practice with the code: 39miell


A Hot Take on GitHub Actions

A couple of days ago I got access to GitHub Actions in Beta. I felt vaguely interested in it when I briefly read up on it, but now I’m like Holt geeking out on Moneyball:

This is not a considered post, so may contain errors, both egregious and small. I’ll edit them if I’m corrected.

What is it?

GitHub Actions can be described in many ways, but for most people that use GitHub its immediate power will lie in it enabling you to remove the need for any separate CI tooling.

You create a YAML file in .github/workflows/ within your repo that might look like this:

 name: Application
 on: push
 jobs:
   build:
     name: Shares run
     runs-on: ubuntu-latest
     steps:
     - uses: actions/checkout@master
     - uses: ./
       env:
         GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} 

It’s a pipeline definition file similar to GoCD’s, or other definition formats for Jenkins et al. You can trigger workflows based on (for example) a crontab schedule, or repository push, or repository pull-request, or when a URL is hit. I’m sure more triggers are to come, assuming they don’t exist already.

The format isn’t 100% intuitive, but is as easy to pick up as anything else, and I’m sure the docs will improve (right now there seems to be two sets of docs, one more formal and in the old (deprecated) HCL format, and the other less formal and in the new YAML format. I’m not entirely sure of the status of the ‘older’ documentation, but it hasn’t failed me yet).

GitHub Actions doesn’t just consist of this functionality in your repo. GitHub is providing a curated set of canned actions here that you can reference in your workflows. You needn’t use theirs, either, you can use any you can find on GitHub (or maybe anywhere else; I haven’t tried).

So What?

For me, the big deal is that this co-locates the actions with your code. So you can trigger a rebuild on a push, or on a schedule, or from an external URL. Just like CI tools do, but with less hassle and zero setup.

But it doesn’t just co-locate code and CI.

It is also threatening to take over CD, secrets management (there’s a ‘Secrets’ tab in the repo’s settings now), artifact store (there’s a supported ‘upload-artifact’ action that pushes arbitrary files to your repo), and user identity. Add in the vulnerability detection functionality and the whole package is as compelling as hell.

An Azure Gateway Drug? An AWS Killer?

When the possibilities of this start to dawn on you, it’s truly dizzying.

GitHub effectively gives you, for free, a CI/CD platform to run more or less whatever you like (but see limits, below). You can extend it to manage your code workflow in however sophisticated a way you like, as you have access to the repository’s GitHub token.

The tradeoff is that it’s all so easy that your business is soon going to depend on GitHub so much Microsoft will have a grip on you as tight as Windows used to.

I think the real trojan horse here is user identity. By re-using the identity management your business might already trust in GitHub, and extending its scope to help solve the challenges of secrets management and artifact stores, whole swathes of existing work could be cut away from your operational costs.

Some Detail

The default ‘hello-github-action’ setup demonstrates a Docker container that runs on an Ubuntu VM base. I found this quite confusing. Is access to the VM possible? If it’s not, why do I care whether it’s running on Ubuntu 18 or Ubuntu 16? I did some wrangling with this but ran into apparently undocumented requirements for an action.yml file, and haven’t had time to bottom them out.

(As an aside, the auto-created lab that GitHub makes for new users is one of the best UX’s I’ve ever seen for onboarding to a new product.)

What you do get is root within the container. Nice. And you can use an arbitrary container, from DockerHub or wherever.

You also get direct access back to GitHub without any faff. By default you get access to a github secret.

As with all these remote build environments, debugging can be a PITA. You can rig up a local Docker container to behave as it would on the server, but it’s a little fiddly to get the conventions right, as not everything about the setup is documented.

Limits and Restrictions

Limits are listed here, and includes a stern warning not to use this for ‘serverless computing’, or “Any other activity unrelated to the production, testing, deployment, or publication of the software project associated with the repository where GitHub Actions are used. In other words, be cool, don’t use GitHub Actions in ways you know you shouldn’t.”

Which makes me wonder: are they missing an opportunity here? I have serverless applications I could run on here, and (depending on the cost) might be willing to pay GitHub to host them for me. I suspect that they are not going to sit on that opportunity for long.

Each virtual machine has the same hardware resources available, which I assume are freely available to the running container:

  • 2 core CPUs
  • 7 GB of RAM memory
  • 14 GB of SSD disk space

which seems generous to me.

The free tier gives you 2000 minutes (about a day and a half) of runtime, which also seems generous.

Conclusion

GitHub Actions is a set of features with enormous potential for using your codebase as a lever into your entire compute infrastructure. It flips the traditional view of code as just something to store, and compute where the interesting stuff happens on its head: the code is now the centre of gravity for your compute, and it’s only a matter of time before everything else follows.

I’m starting to think Microsoft got a bargain.

Links

GitHub Actions help

Curated actions

Developer Docs


Learn Bash the Hard Way

Learn Git the Hard Way

Learn Terraform the Hard Way


Get 39% off Docker in Practice with the code: 39miell


Seven God-Like Bash History Shortcuts You Will Actually Use

Intro

Most guides to bash history shortcuts exhaustively list all of the shortcuts available to you.

The problem I always had with that was that I would use them once, and then glaze over as I tried out all the possibilities. Then I’d move onto my working day and completely forget them, retaining only the well-known !! trick I learned when I first started using bash.

So most never got committed to memory.

Here I outline the shortcuts I actually use every day. When people see me use them they often ask me “what the hell did you do there!?”, conferring God-like status on me with minimal effort or intelligence required.

I recommend using one a day for a week, then moving onto the next one. It’s worth taking your time to get them under your fingers, as the time you save will be significant in the long run.

1) !$ – The ‘Last Argument’ One

If you only take one shortcut from this article, make it this one.

It substitutes in the last argument of the last command into your line.

Consider this scenario:

$ mv /path/to/wrongfile /some/other/place
mv: cannot stat '/path/to/wrongfile': No such file or directory

Ach, I put the wrongfile filename in my command. I should have put rightfile instead.

You might decide to fully re-type the last command, and replace wrongfile with rightfile.

Instead, you can type:

$ mv /path/to/rightfile !$
mv /path/to/rightfile /some/other/place

and the command will work.

There are other ways to achieve the above in bash with shortcuts, but this trick of re-using the last argument of the last command is one I use the most.

https://www.educative.io/courses/master-the-bash-shell

2) !:2 – The ‘nth Argument’ One

Ever done anything like this?

$ tar -cvf afolder afolder.tar
tar: failed to open

Like others, I get the arguments to tar (and ln) wrong more than I would like to admit:

When you mix up arguments like that, you can run:

$ !:0 !:1 !:3 !:2
tar -cvf afolder.tar afolder

and your reputation will be saved.

The last command’s items are zero-indexed, and can be substituted in with the number after the !:.

Obviously, you can also use this to re-use specific arguments from the last command rather than all of them.

3) !:1-$ – The ‘All The Arguments’ One

Imagine you run a command, and realise that the arguments were correct, but

$ grep '(ping|pong)' afile

I wanted to match ping or pong in a file, but I used grep rather than egrep.

I start typing egrep, but I don’t want to re-type the other arguments, so I can use the !:1-$ shortcut to ask for all the arguments to the previous command from the second one (remember they’re zero-indexed) to the last one (represented by the $ sign):

$ egrep !:1-$
egrep '(ping|pong)' afile
ping

You don’t need to pick 1-$, you can pick a subset like 1-2, or 3-9 if you had that many arguments in the previous command.


This is based on some of the contents of my book Learn Bash the Hard Way

hero

Preview available here.


4) !-2:$ – The ‘Last But n‘ One

The above shortcuts are great when I know immediately how to correct my last command, but often I run commands after the orignal one which mean that the last command is no longer the one I want to reference.

For example, using the mv example from before, if I follow up my mistake with an ls check of the folder’s contents:

$ mv /path/to/wrongfile /some/other/place
mv: cannot stat '/path/to/wrongfile': No such file or directory
$ ls /path/to/
rightfile

…I can no longer use the !$ shortcut.

In these cases, you can insert a -n: (where n is the number of commands to go back in the history) after the ! to grab the last argument from an older command:

$ mv /path/to/rightfile !-2:$
mv /path/to/rightfile /some/other/place

Again, once learned, you may be surprised at how often you need it.

5) !$:h – The ‘Get Me The Folder’ One

This one looks less promising on the face of it, but is something I use dozens of times daily.

Imagine I run a command like this:

$ tar -cvf system.tar /etc/system
 tar: /etc/system: Cannot stat: No such file or directory
 tar: Error exit delayed from previous errors. 

The first thing I might want to do is go to the /etc folder to see what’s in there and work out what I’ve got wrong.

I can do this at a stroke with:

$ cd !$:h
cd /etc

What this one does is say: get the last argument to the last command (/etc/system), and take off its last filename component, leaving only the /etc.

6) !#:1 – The ‘The Current Line’ One

I spent years occasionally wondering if I could reference an argument on the current line before finally looking it up and learning it. I wish I’d done so well before.

I most commonly use it to make backup files

$ cp /path/to/some/file !#:1.bak
cp /path/to/some/file /path/to/some/file.bak

but once under the fingers it can be a very quick alternative to

7) !!:gs – The ‘Search and Replace’ One

This one searches across the referenced command, and replaces what’s in the first two / characters with what’s in the second two.

Say I want to tell the world that mys key does not work, and outputs f instead.

$ echo my f key doef not work
my f key doef not work

Then I realise that I was just wrongly hitting the f key by accident.

To replace all the fs with ses, I can type:

$ !!:gs/f /s /
echo my s key does not work
my s key does not work

It doesn’t just work on single characters. I can replace words or sentences too:

$ !!:gs/does/did/
echo my s key did not work
my s key did not work

Test

Just to show you how these shortcuts can be combined, can you work out what these toenail clippings will output?

$ ping !#:0:gs/i/o
$ vi /tmp/!:0.txt
$ ls !$:h
$ cd !-2:h
$ touch !$!-3:$ !! !$.txt
$ cat !:1-$

Learn bash interactively in the browser here.


How Long Will It Take For The Leavers To Leave?

This piece seeks to answer a simple question: how long would it take for enough people to die that the Brexit decision would be reversed?

This has been informally speculated on before, but I haven’t seen any analysis done on the numbers, so I decided to do it myself.

The tl;dr is that the turning point is around July/August 2020:

Assumptions

To arrive at this number I had to make some assumptions:

  • Everyone that voted in June 2016 would vote exactly the same way again (or not vote again)
  • Everyone that comes of age to vote would vote in the same proportions (by age group) as in June 2016

Obviously, these assumptions don’t make a realistic prediction of the result of any second referendum, not least because the question itself would likely be different.

The Numbers

To arrive at the number, I first took the raw votes from June 2016:

  • Leave: 17,410,742
  • Remain: 16,141,241

Then I got the breakdown of votes by age group, based on the figures from Lord Ashcroft’s site here:

LeaveRemain
18-240.270.73
25-340.380.62
35-440.520.48
45-540.560.44
55-640.570.43
65+0.600.40

From here, what we need to work out is:

  • How many people will come ‘of age’ to vote per month
  • How many people will die per month, by age group

Fortunately the ONS collects data on births and deaths by age group, so we can estimate these values.

How Many New Remainers Will There Be?

These are the population figures broken down by age group at the time of the 2016 vote, taken from ‘ukmidyearestimates.xls 2012-2016, UK population counts for mid 2016’.

0-44014300
5-94037500
10-143625100
15-193778900
20-244253800
25-294510600
30-344408200
35-394179500
40-444174100
45-494619100
50-544632000
55-594066700
60-643534200
65-693636500
70-742852100
75-792154500
80-841606700
85-89993000
90 and over571200
65648000

Unfortunately the age groups do not align with Lord Ashcroft’s figures in the first table, but we can estimate the number of people who get the vote every month by taking the number of people in the 15-19 age group (3778900), and multiplying them by 3/5ths to get the number of people who could not vote in 2016 that can three years later.

This gives us a number of 2267340. Over the three years, this is 62982 people per month that can vote.

If we assume that the proportions voting for either side remain the same for the 18-24 age group, then 46% more of these votes will go to remain than leave (73% – 27%).

This gives us a final figure of 18,754 extra remain votes per month.

How Many Leavers Die Per Month?

Deaths by age group vary little over the years, so I took the numbers recorded in 2016, 2017 and 2018:

2016
15-4415128
45-6462,679
65+442,767
2017
15-4414514
45-6462,517
65+452,329
2018
15-4415140
45-6463,913
65+456,731

Looking at these numbers gives roughly 450,000 people in the 65+ age bracket dying per year. Deaths between 15-64 are relatively speaking negligible, and the voting proportions by age group mean that votes lost and gained roughly cancel one another out (the exact numbers give a few dozen more to remain per month, but this can be ignored).

Dividing 450,000 by 12 gives a figure of 37,500 deaths per month in the 65% age group.

Taking the net leave vote in that age group (20%) and multiplying out gives a figure of roughly 7,500 leave votes lost per month.

Taking the net of the two numbers gives a gain for leave votes of about 26,000 per month, resulting in this graph:

which gives a rough crossover point of mid-2020.

Conclusion

I’ve made many crude assumptions here, and one could argue on both sides for tweaks to the numbers here and there. For example, you could argue that those in the 15-18 age bracket in 2016 would be even more likely to vote remain than the 18-24 cohort.

And of course, this analysis makes assumptions that won’t hold true in reality, such as that everyone would vote the same way as in 2016, and the age group analysis of voting patterns was accurate and uniform within the groups.

Broadly, though, the demographics point to a majority for remain happening around mid-2020 if nothing else changed from 2016.


Sources

Analysis: https://docs.google.com/spreadsheets/d/1n5r6W951DDBvGhD00Ou2Q19aBbAxtOx-n3Chs6RDeek/edit#gid=13089380

ONS numbers: https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationestimates/datasets/populationestimatesforukenglandandwalesscotlandandnorthernireland

https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths

Ashcroft Polls:

https://www.jrf.org.uk/report/brexit-vote-explained-poverty-low-skills-and-lack-opportunities

Goodbye Docker: Purging is Such Sweet Sorrow

After 6 years, I removed Docker from all my home servers.

apt purge -y docker-ce

Why?

This was triggered by a recurring incident I faced where the Docker daemon was using 100% CPU on multiple cores that made the host effectively unusable.

This had happened a few times before, and was likely due to a script that had got out of hand starting up too many containers. I’d never really got to the bottom of it, as I had to run a command to kill off all the containers and restart the daemon. This time, the daemon wouldn’t restart without a kill -9, so I figured enough was enough.

Anyway, I didn’t necessarily blame Docker for it, but it did add force to an argument I’d heard before:

Why does Docker need a daemon at all?

Podman, Skopeo, and Buildah

These three tools are an effort mostly pushed by RedHat that do everything I need Docker to do. They don’t require a daemon or access to a group with root privileges.

Podman

Podman replaces the Docker command for most of its sub-commands (run, push, pull etc). Because it doesn’t need a daemon, and uses user namespacing to simulate root in the container, there’s no need to attach to a socket with root privileges, which was a long-standing concern with Docker.

Buildah

Buildah builds OCI images. Confusingly, podman build can also be used to build Docker images also, but it’s incredibly slow and used up a lot of disk space by using the vfs storage driver by default. buildah bud (‘build using Dockerfile’) was much faster for me, and uses the overlay storage driver.

The user namespacing allowing rootless builds was the other killer feature that made me want to move. I wrote a piece about trying to get rootless builds going last year, and now it comes out of the box with /etc/subuid and /etc/subgid set up for you, on Ubuntu at least.

Skopeo

Skopeo is a tool that allows you to work with Docker and OCI images by pushing, pulling, and copying images.

The code for these three are open source and available here:

Podman

Buildah

Skopeo

Steps to Move

Installing these tools on Ubuntu was a lot easier than it was 6 months ago.

I did seem to have to install runc independently of those instructions. Not sure why it wasn’t a pre-existing dependency.

First, I replaced all instances of docker in my cron and CI jobs with podman. That was relatively easy as it’s all in my Ansible scripts, and anything else was a quick search through my GitHub repos.

Once that was bedded in, I could see if anything else was calling docker by using sysdig to catch any references to it:

sysdig | grep -w docker

This may slow down your system considerably if you’re performance-sensitive.

Once happy that nothing was trying to run docker, I could run:

apt remove -y docker-ce

I didn’t actually purge in case there was some config I needed.

Once everything was deemed stable, the final cleanup could take place:

  • Remove any left-over sources in /etc/apt/* that point to Docker apt repos
  • Remove the docker group from the system with delgroup docker
  • Remove any left-over files in etc/docker/*, /etc/default/docker and /var/lib/docker

A few people asked what I did about Docker Compose, but I don’t use it, so that wasn’t an issue for me.

Edit: there exists a podman-compose project,
but it’s not considered mature.

Differences?

So far, and aside from the ‘no daemon’ and ‘no sudo access required’, I haven’t noticed many differences.

Builds are local to my user (in ~/.local/containers) rather than global (in /var/lib/docker), in keeping with the general philosophy of these tools as user-oriented rather than daemon-oriented. But since my home servers have only one user using Docker, that wasn’t much of an issue.

The other big difference I noticed was that podman pull downloads get all layers in parallel, in contrast to Docker’s. I don’t know if this causes problems if too many images are being pulled at once, but that wasn’t a concern for me.


If you like this, you might like one of my books:
Learn Bash the Hard Way

Learn Git the Hard Way
Learn Terraform the Hard Way

LearnGitBashandTerraformtheHardWay

Get 39% off Docker in Practice with the code: 39miell2