Practical Strategies for Implementing DevSecOps in Large Enterprises

At Container Solutions, we often work with large enterprises who are at various different stages of adopting cloud technologies. These companies are typically keen to adopt modern Cloud Native software working practices and technologies as itemised in our Maturity Matrix, so come to us for help, knowing that we’ve been through many of these transformation processes before. 

Financial services companies are especially keen to adopt DevSecOps, as the benefits to them are obvious given their regulatory constraints and security requirements. This article will focus on a common successful pattern of adoption for getting DevSecOps into large-scale enterprises that have these kinds of constraints on change.

DevSecOps and institutional inertia

The first common misconception about implementing DevSecOps is that it is primarily a technical challenge but, as we’ve explored on WTF before, it is at least as much about enabling effective communication. Whilst we have engineering skills in cutting-edge tooling and cloud services, there is little value in delivering a nifty technical solution if the business it’s delivered for is unable or unwilling to use it. If you read technical blog posts on the implementation of DevSecOps, you might be forgiven for thinking that the only things that matter are the tooling you choose, and how well you write and manage the software that is built on this tooling.

For organisations that were ‘born in the cloud’, where everyone is an engineer and has little legacy organisational scar tissue to consider, this could indeed be true. In such places, where the general approach to DevSecOps is well-grasped and agreed on by all parties, the only things to be fought over are indeed questions of tooling. This might be one reason why such debates take on an almost religious fervour.

The reality for larger enterprises that aren’t born in the cloud is that there are typically significant areas of institutional inertia to overcome. These include (but are not limited to):

  • The ‘frozen middle
  • Siloed teams that have limited capability in new technologies and processes
  • Internal policies and process designed for the existing ways of working

Prerequisites for success

Before outlining the pattern for success, it’s worth pointing out two critical prerequisites for enterprise change management success in moving to DevSecOps. As an aside, these prerequisites are not just applicable to DevSecOps but apply to most change initiatives.

The first is that the vision to move to a Cloud Native way of working must be clearly articulated to those tasked with delivering on it. The second is that the management who articulate the vision must have ‘bought into’ the change. This doesn’t mean they just give orders and timelines and then retreat to their offices, it means that they must back up the effort when needed with carrots, sticks, and direction when those under them are unsure how to proceed. If those at the top are not committed in this way, then those under them will certainly not push through and support the changes needed to make DevSecOps a success.

A three-phase approach

At Container Solutions we have found success in implementing DevSecOps in these contexts by taking a three-phase approach:

  1. Introduce tooling
  2. Baseline adoption
  3. Evolve towards an ideal DevSecOps practice

The alternative that this approach is put up against is the ‘build it right first time’ approach, where everything is conceived and delivered in one “big bang” style implementation. 

  1. Introduce tooling

In this phase you correlate the security team’s (probably manual) process with the automation tooling you have chosen, and determine their level of capability for automation. At this point you are not concerned with how closely the work being done now matches the end state you would like to reach. Indeed, you may need to compromise against your ideal state. For example, you might skip writing a full suite of tests for your policies.

The point of this initial phase is to create alignment on the technical direction between the different parties involved as quickly and effectively as possible. To repeat: this is a deliberate choice over technical purity, or speed of delivery of the whole endeavour.

The security team is often siloed from both the development and cloud transformation teams. This means that they will need to be persuaded, won over, trained, and coached to self-sufficiency.

Providing training to the staff at this point can greatly assist the process of adoption by emphasising the business’s commitment to the endeavour and setting a minimum baseline of knowledge for the security team. If the training takes place alongside practical implementation of the new skills learned, it makes it far more likely that the right value will be extracted from the training for the business.

The output of this phase should be that:

  • Security staff are comfortable with (at least some of) the new tooling
  • Staff are enthused about the possibilities offered by DevSecOps, and see its value
  • Staff want to continue and extend the efforts towards DevSecOps adoption
  1. Get To baseline adoption

Once you have gathered the information about the existing process, the next step is to automate them as far as possible without disrupting the existing process too much. For example, if security policy adherence is checked manually in a spreadsheet by the security team (not an uncommon occurrence), those steps can be replaced by automation. Tools that might be used for this include some combination of pipelines, Terraform, Inspec, and so on. The key point is to start to deliver benefits for the security team quickly and help them see that this will make them more productive and (most importantly of all) increase the level of confidence they have in their security process.

Again, the goal for this stage is to level up the capabilities of the security team so that the move towards DevSecOps is more self-sustaining rather than imposed from outside. This is the priority over speed of delivery. In practical terms, this means that it is vital to offer both pairing (to elevate knowledge) and support (when things go wrong) from the start to maintain goodwill towards the effort. The aim is to spread and elevate the knowledge as far across the department as possible. 

Keep in mind, though, that knowledge transfer will likely slow down implementation. This means that it is key to ensure you regularly report to stakeholders on progress regarding both policy deployment and policy outputs, as this will help sustain the momentum for the effort.

Key points:

  • Report on progress as you go
  • Provide (and focus on) help and support for the people who will maintain this in future
  • Where you can, prioritise spreading knowledge far and wide over delivering quickly

Once you have reached baseline adoption, you should be at a ‘point of no return’ which allows you to push on to move to your ideal target state.

  1. Evolve to pure DevSecOps

Now that you have brought the parties on-side and demonstrated progress, you can start to move towards your ideal state. This begs the question of what that ideal state is, but we’re not going to exhaustively cover that here as that’s not the focus. Suffice it to say that security needs to be baked into every step of the overall development life cycle and owned by the development and operations teams as much as it is by the security team.

Some of the areas you would want to work on from here include:

  • Introducing/cementing separation of duties
  • Setting up tests on the various compliance tools used in the SDLC
  • Approval automation
  • Automation of policy tests’ efficacy and correctness
  • Compliance as code

These areas, if tackled too early, can bloat your effort to the point where the business sees it as too difficult or expensive to achieve. This is why it’s important to tackle the areas that maximise the likelihood of adoption of tooling and principles in the early stages.

Once all these things are coming together, you will naturally start to turn to the organisational changes necessary to get you to a ‘pure DevSecOps’ position, where development teams and security teams are working together seamlessly.


Like all formulas for business and technological change, this three-phase approach to introducing DevSecOps can’t be applied in exactly the same way in every situation. However, we’ve found in practice that the basic shape of the approach is very likely to be a successful one, assuming the necessary prerequisites are in place.

Building DevSecOps adoption in your business is not just about speed of delivery, it’s about making steady progress whilst setting your organisation up for success. To do this you need to make sure you are building capabilities and not just code.

This article was originally published on Container Solutions’ blog and is reproduced here by permission.

If you like this, you might like one of my books:
Learn Bash the Hard Way

Learn Git the Hard Way
Learn Terraform the Hard Way

Buy in a bundle here

If you enjoyed this, then please consider buying me a coffee to encourage me to do more.

A Little Shell Rabbit Hole

Occasionally I run dumb stuff in the terminal. Sometimes something unexpected happens and it leads me to wonder ‘how the hell did that work?’

This article is about one of those times and how looking into something like that taught me a few new things about shells. After decades using shells, they still force me to think!

The tl;dr is at the end if you don’t want to join me down this rabbit hole…

The Dumb Thing I Ran

The dumb thing I occasionally ran was:

grep .* *

If you’re experienced in the shell you’ll immediately know why this is dumb. For everyone else, here are some reasons:

  • The first argument to grep should always be a quoted string – without them, the shell treats the .* as a glob, not a regexp
  • grep .* just matches every line, so…
  • you could just get almost the same output by running cat *

Not Quite So Dumb

Actually, it’s not quite as dumb as I’ve made out. Let me explain.

In the bash shell, ‘.*‘ (unquoted) is a glob matching all the files beginning with the dot character. So the ‘grep .* *‘ command above interpreted in this (example) context:

$ ls -a1
.    ..    .adotfile    file1   file2

Would be interpreted as the command in bold below:

$ echo grep .* *
grep . .. .adotfile file1 file2

The .* gets expanded by the shell as a glob to all file or folders beginning with the literal dot character.

Now, remember, every folder contains at least two folders:

  • The dot folder (.), which represents itself.
  • The double-dot folder (..), which represents the parent folder

So these get added to the command:

grep . ..

Followed by any other file or folder beginning with a dot. In the example above, that’s .adotfile.

grep . .. .adotfile

And finally, the ‘*‘ at the end of the line expands to all of the files in the folder that don’t begin with a dot, resulting in:

grep . .. .adotfile file1 file2

So, the regular expression that grep takes becomes simply the dot character (which matches any line with a single character in it), and the files it searches are the remaining items in the file list:


Since one of those is a folder (..), grep complains that:

grep: ..: Is a directory

before going on to match any lines with any characters in. The end result is that empty lines are ignored, but every other line is printed on the terminal.

Another reason why the command isn’t so dumb (and another way it differs from ‘cat *‘) is that since multiple files are passed into grep, it reports on the filename, meaning the output automatically adds which file the line comes from.

bash-5.1$ grep .* *
grep: ..: Is a directory
.adotfile:content in a dotfile
file1:a line in file1
file2:a line in file2

Strangely, for two decades I hadn’t noticed that this is a very roundabout and wrong-headed (ie dumb) way to go about things, nor had I thought about its output being different from what I might have expected; it just never came up. Running ‘grep .* *‘ was probably a bad habit I picked up when I was a shell newbie last century, and since then I never needed to think about why I did it, or even what it did until…

Why It Made Me Think

The reason I had to think about it was that I started to use zsh as my default terminal on my Mac. Let’s look at the difference with some commands you can try:

bash-5.1$ mkdir rh && cd rh
bash-5.1$ cat > afile << EOF
bash-5.1$ bash
bash-5.1$ grep .* afile
grep: ..: Is a directory
bash-5.1$ zsh 
zsh$ grep .* afile
zsh:1: no matches found: .*

For years I’d been happily using grep .* but suddenly it was telling me there were no matches. After scratching my head for a short while, I realised that of course I should have quotes around the regexp, as described above.

But I was still left with a question: why did it work in bash, and not zsh?

Google It?

I wasn’t sure where to start, so I googled it. But what to search for? I tried various combinations of ‘grep in bash vs zsh‘, ‘grep without quotes bash zsh‘, and so on. While there was some discussion of the differences between bash and zsh, there was nothing which addressed the challenge directly.


Since google wasn’t helping me, I looked for shell options that might be relevant. Maybe bash or zsh had a default option that made them behave differently from one another?

In bash, a quick look at the options did not reveal many promising candidates, except for maybe noglob:

bash-5.1$ set -o | grep glob
noglob off
bash-5.1$ set -o noglob
bash-5.1$ set -o | grep glob
noglob on
bash-5.1$ grep .* *
grep: *: No such file or directory

But this is different from zsh‘s output. What noglob does is completely prevent the shell from expanding globs. This means that no file matches the last ‘*‘ character, which means that grep complains that no files are matched at all, since there is no file named ‘*‘ in this folder.

And for zsh? Well, it turns out there are a lot of options in zsh…

zsh% set -o | wc -l

Even just limiting to those options with glob in them doesn’t immediately hit a jackpot:

zsh% set -o | grep glob
nobareglobqual        off
nocaseglob            off
cshnullglob           off
extendedglob          off
noglob                off
noglobalexport        off
noglobalrcs           off
globassign            off
globcomplete          off
globdots              off
globstarshort         off
globsubst             off
kshglob               off
nullglob              off
numericglobsort       off
shglob                off
warncreateglobal      off

While noglob does the same as in bash, after some research I found that the remainder are not relevant to this question.

(Trying to find this out, though, it tricky. First zsh‘s man page is not complete like bash‘s, it’s divided into multiple man pages. Second, concatenating all the zsh man pages with man zshall and searching for noglob gest no matches. It turns out that options are documented in caps with underscored separating words. So, in noglob‘s case, you have to search for NO_GLOB. Annoying.)

zsh with xtrace?

Next I wondered whether this was due to some kind of startup problem with my zsh setup, so I tried starting up zsh with the xtrace option to see what’s run on startup. But the output was overwhelming, with over 13,000 lines pushed to the terminal:

bash-5.1$ zsh -x 2> out
zsh$ exit
bash-5.1$ wc -l out

I did look anyway, but nothing looked suspicious.

zsh with NO_RCS?

Back to the documentation, and I found a way to start zsh without any startup files by starting with the NO_RCS option.

bash-5.1$ zsh -o NO_RCS
zsh$ grep .* afile
zsh:1: no matches found: .*

There was no change in behaviour, so it wasn’t anything funky I was doing in the startup.

At this point I tried using the xtrace option, but then re-ran it in a different folder by accident:

zsh$ set -o xtrace
zsh$ grep .* *
zsh: no matches found: .*
zsh$ cd ~/somewhere/else
zsh$ grep .* *
+zsh:3> grep .created_date notes.asciidoc

Interesting! The original folder I created to test the grep just threw an error (no matches found), but when there is a dotfile in the folder, it actually runs something… and what it runs does not include the dot folder (.) or parent folder (..)

Instead, the ‘grep .* *‘ command expands the ‘.*‘ into all the files that begin with a dot character. For this folder, that is one file (.created_date), in contrast to bash, where it is three (. .. .created_date). So… back to the man pages…


After another delve into the man page, I found the relevant section in man zshall that gave me my answer:



In filename generation, the character /' must be matched explicitly; also, a '.' must be matched explicitly at the beginning of a pattern or after a '/', unless the GLOB_DOTS option is set. No filename generation pattern matches the files '.' or '..'. In other instances of pattern matching, the '/' and '.' are not treated specially.

So, it was as simple as: zsh ignores the ‘.‘ and ‘..‘ files.

But Why?

But I still don’t know why it does that. I assume it’s because the zsh designers felt that that wrinkle was annoying, and wanted to ignore those two folders completely. It’s interesting that there does not seem to be an option to change this behaviour in zsh.

Does anyone know?

If you like this, you might like one of my books:
Learn Bash the Hard Way

Learn Git the Hard Way
Learn Terraform the Hard Way

Buy in a bundle here

If you enjoyed this, then please consider buying me a coffee to encourage me to do more.

“Who Should Write the Terraform?”

The Problem

Working in Cloud Native consulting, I’m often asked about who should do various bits of ‘the platform work‘.

I’m asked this in various forms, and at various levels, but the title’s question (‘Who should write the Terraform?) is a fairly typical one. Consultants are often asked simple questions that invite simple answers, but it’s our job to frustrate our clients, so I invariably say “it depends”.

The reason it depends is that the answers to these seemingly simple questions are very context-dependent. Even if there is an ‘ideal’ answer, the world is not ideal, and the best thing for a client at that time might not be the best thing for the industry in general.

So here I attempt to lay out the factors that help me answer that questions as honestly as possible. But before that, we need to lay out some background.

Here’s an overview of the flow of the piece:

  • What is a platform?
  • How we got here
    • Coders and Sysadmins became…
    • Dev and Ops, but silos and slow time to market, so…
    • DevOps, but not practical, so…
    • SRE and Platforms
  • The factors that matter
    • Non-negotiable standards
    • Developer capability
    • Management capability
    • Platform capability
    • Time to market

What is a Platform?

Those old enough to remember when the word ‘middleware’ was everywhere will know that many industry terms are so vague or generic as to be meaningless. However, for ‘platform’ work we have a handy definition, courtesy of Team Topologies:

The purpose of a platform team is to enable stream-aligned teams to deliver
work with substantial autonomy. The stream-aligned team maintains full
ownership of building, running, and fixing their application in production.
The platform team provides internal services to reduce the cognitive load
that would be required from stream-aligned teams to develop these
underlying services.

Team Topologies, Matthew Skelton and Manuel Pais

A platform team, therefore, (and putting it crudely) builds the stuff that lets others build and run their stuff.

So… is the Terraform written centrally, or by the stream-aligned teams?

To explain how I would answer that, I’m going to have to do a little history.

How We Got Here

Coders and Sysadmins

In simpler times – after the Unix epoch and before the dotcom boom – there were coders and there were sysadmins. These two groups speciated from the generic ‘computer person’ that companies found they had to have on the payroll (whether they liked it or not) in the 1970s and 80s.

As a rule, the coders liked to code and make computers do new stuff, and the sysadmins liked to make sure said computers worked smoothly. Coders would eagerly explain that with some easily acquired new kit, they could revolutionise things for the business, while sysadmins would roll their eyes and ask how this would affect user management, or interoperability, or stability, or account management, or some other boring subject no-one wanted to hear about anymore.

I mention this because this pattern has not changed. Not one bit. Let’s move on.

Dev and Ops

Time passed, and the Internet took over the world. Now we had businesses running websites as well as their internal machines and internal networks. Those websites were initially given to the sysadmins to run. Over time, these websites became more and more important for the bottom line, so eventually, the sysadmins either remained sysadmins and looked after ‘IT’, or became ‘operations’ (Ops) staff and looked after the public-facing software systems.

Capable sysadmins had always liked writing scripts to automate manual tasks (hence the t-shirt), and this tendency continued (sometimes) in Ops, with automation becoming the defining characteristic of modern Ops.

Eventually a rich infrastructure emerged around the work. ‘Pipelines’ started to replace ‘release scripts’, and concepts like ‘continuous integration’, and ‘package management’ arose. But we’re jumping ahead a bit; this came in the DevOps era.

Coders, meanwhile, spent less and less time doing clever things with chip registers and more and more time wrangling different software systems and APIs to do their business’s bidding. They stopped being called ‘coders’ and started being called ‘developers’.

So ‘Devs’ dev’d, and ‘Ops’ ops’ed.

These groups grew in size and proportion of the payroll as software started to ‘eat the world’.

In reality, of course, there was a lot of overlap between the two groups, and people would often move from one side of the fence to the other. But the distinction remained, and become organisational orthodoxy.

Dev and Ops Inefficiencies

As the Dev and Ops this pattern became bedded into organisation, people noted some inefficiencies with this state of affairs:

  • Release overhead
  • Misplaced expertise
  • Cost

First, there was a release overhead as Dev teams passed changes to Ops. Ops teams typically required instructions for how to do releases, and in a pre-automation age these were often prone to error without app- or even release-specific knowledge. I was present about 15 years in a very fractious argument between a software supplier and its client’s Ops team after an outage. The Ops team attempted to follow instructions for a release, which resulted in an outage, as instructions were not followed correctly. There was much swearing as the Ops team remonstrated that the instructions were not clear enough, while the Devs argued that if the instructions had been followed properly then it would have worked. Fun.

Second, Ops teams didn’t know in detail what they were releasing, so couldn’t fix things if they went wrong. The best they could do was restart things and hope they worked.

Third, Ops teams looked expensive to management. They didn’t deliver ‘new value’, just farmed existing value, and appeared slow to respond and risk-averse.

I mention this because this pattern has not changed. Not one bit. Let’s move on.

These and other inefficiencies were characterised as ‘silos’ – unhelpful and wasteful separations of teams for (apparently) no good purpose. Frictions increased as these mismatches were exacerbated by embedded organisational separation.

The solution was clearly to get rid of the separation: no more silos!

Enter DevOps

The ‘no more silos’ battle cry got a catchy name – DevOps. The phrase was usefully vague and argued over for years, just as Agile was and is (see here). DevOps is defined by Wikipedia as ‘a set of practices that combines software development (Dev) and IT operations (Ops)’.

At the purest extreme, DevOps is the movement of all infrastructure and operational work and responsibilities (ie ‘delivery dependencies’) into the development team.

This sounded great in theory. It would:

  • Place the operational knowledge within the development team, where its members could more efficiently collaborate in tighter iterations
  • Deliver faster – no more waiting weeks for the Ops team to schedule a release, or waiting for Ops to provide some key functionality to the development team
  • Bring the costs of operations closer to the value (more exactly: the development team bore the cost of infrastructure and operations as part of the value stream), making P&L decisions closer to the ‘truth’

DevOps Didn’t

But despite a lot of effort, the vast majority of organisations couldn’t make this ideal work in practice, even if they tried. The reasons for this were systemic, and some of the reasons are listed below:

  • Absent an existential threat, the necessary organisational changes were more difficult to make. This constraint limited the willingness or capability to make any of the other necessary changes
  • The organisational roots of the Ops team were too deep. You couldn’t uproot the metaphorical tree of Ops without disrupting the business in all sorts of ways
  • There were regulatory reasons to centralise Ops work which made distribution very costly
  • The development team didn’t want to – or couldn’t – do the Ops work
  • It was more expensive. Since some work would necessarily be duplicated, you couldn’t simply distribute the existing Ops team members across the development teams, you’d have to hire more staff in, increasing cost

I said ‘the vast majority’ of organisations couldn’t move to DevOps, but there are exceptions. The exceptions I’ve seen in the wild implemented a purer form of DevOps when there existed:

  • Strong engineering cultures where teams full of T-shaped engineers want to take control of all aspects of delivery AND
  • No requirement for centralised control (eg regulatory/security constraints)


  • A gradual (perhaps guided) evolution over time towards the breaking up of services and distribution of responsibility


  • Strong management support and drive to enable

The most famous example of the ‘strong management support’ is Amazon, where so-called pizza teams must deliver and support their products independently. (I’ve never worked for Amazon so I have no direct experience of the reality of this). This, notably, was the product of a management edict to ensure teams operated independently.

When I think of this DevOps ideal, I think of a company with multiple teams each independently maintaining their own discrete marketing websites in the cloud. Not many businesses have that kind of context and topology.

Enter SRE and Platforms

One of the reasons listed above for the failure of DevOps was the critical one: expense.

Centralisation, for all its bureaucratic and slow-moving faults, can result in vastly cheaper and more scalable delivery across the business. Any dollar spent at the centre can save n dollars across your teams, where n is the number of teams consuming the platform.

The most notable example of this approach is Google, who have a few workloads to run, and built their own platform to run them on. Kubernetes is a descendant of that internal platform.

It’s no coincidence that Google came up with DevOps’s fraternal concept: SRE. SRE emphasised the importance of getting Dev skills into Ops rather than making Dev and Ops a single organisational unit. This worked well at Google primarily because there was an engineering culture at the centre of the business, and an ability to understand the value of investing in the centre rather than chasing features. Banks (who might well benefit from a similar way of thinking) are dreadful at managing and investing in centralised platforms, because they are not fundamentally tech companies (they are defenders of banking monopoly licences, but that’s a post for another day, also see here).

So across the industry, those that might have been branded sysadmins first rebranded themselves as Ops, then as DevOps, and finally SREs. Meanwhile they’re mostly the same people doing similar work.

Why the History Lesson?

What’s the point of this long historical digression?

Well, it’s to explain that, with a few exceptions, the division between Dev and Ops, and between centralisation and distribution of responsibility has never been resolved. And the reasons why the industry seems to see-saw are the same reasons why the answer to the original question is never simple.

Right now, thanks to the SRE movement (and Kubernetes, which is a trojan horse leading you away from cloud lock-in), there is a fashion-swing back to centralisation. But that might change again in a few years.

And it’s in this historical milieu that I get asked questions about who should be responsible for what with respect to work that could be centralised.

The Factors

Here are the factors that play into the advice that I might give to these questions, in rough order of importance.

Factor One: Non-Negotiable Standards

If you have standards or practices that must be enforced on teams for legal, regulatory, or business reasons, then at least some work needs to be done at the centre.

Examples of this include:

  • Demonstrable separation of duties between Dev and Ops
  • User management and role-based access controls

Performing an audit on one team is obviously significantly cheaper than auditing a hundred teams. Further, with an audit, the majority of expense is not in the audit but the follow-on rework. The cost of that can be reduced significantly if a team is experienced at knowing from the start what’s required to get through an audit. For these reasons, the cost of an audit across your 100 dev teams can be more than 100x the cost of a single audit at the centre.

Factor Two: Engineer Capability

Development teams vary significantly in their willingness to take on work and responsibilities outside their existing domain of expertise. This can have a significant effect on who does what.

Anecdote: I once worked for a business that had a centralised DBA team, who managed databases for thousands of teams. There were endless complaints about the time taken to get ‘central IT’ to do their bidding, and frequent demands for more autonomy and freedom.

A cloud project was initiated by the centralised DBA team to enable that autonomy. It was explained that since the teams could now provision their own database instances in response to their demands, they would no longer have a central DBA team to call on.

Cue howls of despair from the development teams that they need a centralised DBA service, as they didn’t want to take this responsibility on, as they don’t have the skills.

Another example is embedded in the title question about Terraform. Development teams often don’t want to learn the skills needed for a change of delivery approach. They just want to carry on writing in whatever language they were hired to write in.

This is where organisational structures like ‘cloud native centres of excellence’ (who just ‘advise’ on how to use new technologies), or ‘federated devops teams’ (where engineers are seconded to teams to spread knowledge and experience) come from. The idea with these ‘enabling teams’ is that once their job is done they are disbanded. Anyone who knows anything about political or organisational history knows that these plans to self-destruct often don’t pan out that way, and you’re either stuck with them forever, or some put-upon central team gets given responsibility for the code in perpetuity.

Factor Three: Management Capability

While the economic benefits of having a centralised team doing shared work may seem intuitively obvious, senior management in various businesses are often not able to understand its value, and manage it as a pure cost centre.

This is arguably due to assumptions arising out of internal accounting assumptions. Put simply, the value gained from centralised work is not traced back to profit calculations, so is seen as pure cost. (I wrote a little about non-obvious business value here.)

In companies with competent technical management, the value gained from centralised work is (implicitly, due to an understanding of the actual work involved) seen as valuable. This is why tech firms such as Google can successfully manage a large-scale platform, and why it gave birth to SRE and Kubernetes, two icons of tech org centralisation. It’s interesting that Amazon – with its roots in retail, distribution, and logistics – takes a radically different distributed approach.

If your organisation is not capable of managing centralised platform work, then it may well be more effective to distribute the work across the feature teams, so that cost and value can be more easily measured and compared.

Factor Four: Platform Team Capability

Here we are back to the old fashioned silo problem. One of the most common complaints about centralised teams is that they fail to deliver what teams actually need, or do so in a way that they can’t easily consume.

Often this is because of the ‘non-negotiable standards’ factor above resulting in security controls that stifle innovation. But it can also be because the platform team is not interested, incentivised, or capable enough to deliver what the teams need. In these latter cases, it can be very inefficient or even harmful to get them to do some of the platform work.

This factor can be mitigated with good management. I’ve seen great benefits from moving people around the business so they can see the constraints other people work under (a common principle in the DevOps movement) rather than just complain about their work. However, as we’ve seen, poor management is often already a problem, so this can be a non-starter.

Factor Five: Time to Market

Another significant factor is whether it’s important to keep the time to delivery low. Retail banks don’t care about time to delivery. They may say they do, but the reality is that they care far more about not risking their banking licence, not causing outages that attract the interest of regulators. In the financial sector, hedge funds, by contrast, might care very much about time to market as they are unregulated and wish to take advantage of any edge they might have as quickly as possible. Retail banks tend towards centralised organisational architectures, while hedge funds devolve responsibility as close to the feature teams as possible.

So, Who Should Write the Terraform?

Returning to the original question, the question of ‘who should write the Terraform?’ can now be more easily answered, or at least approached. Depending on the factors discussed above, it might make sense for them to be either centralised or distributed.

More importantly, by not simply assuming that there is a ‘right’ answer, you can make decisions about where the work goes with your eyes open about what the risks, trade-offs, and systemic preferences of your business are.

Whichever way you go, make sure that you establish which entity will be responsible for maintaining the resulting code as well as producing it. Code, it is important to remember, is an asset that needs maintenance to remain useful and if this is ignored there could be great confusion in the future.

If you like this, you might like one of my books:
Learn Bash the Hard Way

Learn Git the Hard Way
Learn Terraform the Hard Way

Buy in a bundle here

If you enjoyed this, then please consider buying me a coffee to encourage me to do more.

Business Value, Soccer Canteens, Engineer Retention, and the Bricklayer Fallacy

Having the privilege of working in software in the 2020s, I hear variations on the following ideas expressed frequently:

  • ‘There must be some direct relationship between your work and customer value!’
  • ‘The results of your actions must be measurable!’

These ideas manifest in statements like this, which sound very sensible and plausible:

  • ‘This does not benefit the customer. This is not a feature to the customer. So we should not do it.’
  • ‘We are not in the business of doing X, so should not focus on it. We are in the business of serving the customer’
  • ‘This does not improve any of the key metrics we identified’

I want to challenge these ideas. In fact, I want to turn them on their head:

  • Many peoples’ work generate value by focussing on things that appear to have no measurable or apparently justifiable customer benefit.
  • Moreover, judgements on these matters are what people are (and should be) paid to exercise.

Alex Ferguson and Canteen Design

To encapsulate these ideas I want to use an anecdote from the sporting world, that unforgiving laboratory of success and failure. In that field, the record of Alex Ferguson, manager of Manchester United (a UK football, or soccer team) in one of their ‘golden eras’ from 1986 to 2013, is unimpeachable. During those 27 years, he took them from second-from-bottom in the UK premier league table in 1986 to treble trophy winners in Europe in 1998-1999.

Fortunately, he’s recorded his recollections and lessons in various books, and these books provide a great insight into how such a leader thinks, and what they’re paid to do.

Alex Ferguson demonstrating how elite-level sports teams can be coached to success

Now, to outsiders, the ‘business value’ he should be working towards is obvious. Some kind of variation of ‘make a profit’, or ‘win trophies’, or ‘score more goals than you concede in every match’ is the formulation most of us would come up with. Obviously, these goals break down to sub-goals like:

  • Buy players cheaply and extract more value from them than you paid for
  • Optimise your tactics for your opponents
  • Make players work hard to maintain fitness and skills

Again, we mortals could guess these. What’s really fascinating about Ferguson’s memoirs is the other things he focusses on, which are less obvious to those of us that are not experts in elite-level soccer.

Sometimes if I saw a young player, a lad in the academy, eating by himself, I would go and sit beside him. You have to make everyone feel at home. That doesn’t mean you’re going to be soft on them–but you want them to feel that they belong. I’d been influenced by what I had learned from Marks & Spencer, which, decades ago in harder times, had given their staff free lunches because so many of them were skipping lunch so they could save every penny to help their families. It probably seems a strange thing for a manager to be getting involved in–the layout of a canteen at a new training ground–but when I think about the tone it set within the club and the way it encouraged the staff and players to interact, I can’t overstate the importance of this tiny change.

Alex Ferguson, Leading

Now, I invite you to imagine a product owner, or scrum master for Manchester United going over this ‘update’ with him:

  • How does spending your time with junior players help us score more goals on Saturday?
  • Are we in the business of canteen architecture or soccer matches?
  • How do you measure the benefit of these peripheral activities?
  • Why are you micromanaging building design when we have paid professionals hired in to do that?
  • How many story points per sprint should we allocate to your junior 1-1s and architectural oversight?

It is easy to imagine how that conversation would have gone, especially given Ferguson’s reputation for robust plain speaking. (The linked article is also a mini-goldmine of elite talent management subtleties hiding behind a seemingly brutish exterior.)

Software and Decision Horizons

It might seem like managing a soccer team and working in software engineering are worlds apart, but there’s significant commonality.

Firstly, let’s look at the difference of horizon between our imagined sporting scrum master and Alex Ferguson.

The scrum master is thinking in:

  • Very short time periods (weeks or months)
  • Specific and measurable goals (score more goals!)

Alex Ferguson, by contrast, is thinking in decades-long horizons, and (practically) unmeasurable goals:

  • If I talk to this player briefly now, they may be motivated to work for us for the rest of their career
  • I may encourage others to help their peers by being seen to inculcate a culture of mutual support

I can think of a specific example of such a clash of horizons that resulted in a questionable decision in a software business.

Twenty years ago I worked for a company that had an ‘internal wiki’ – a new thing then. Many readers of this piece will know of the phenomenon of ‘wiki-entropy’ (I just made that word up, but I’m going to use it all the time now) whereby an internal documentation system gradually degrades to useless whatever the value of some of the content on there due to it getting overwhelmed by un-maintained information.

Well, twenty years ago we didn’t have that problem. We decided to hire a young graduate with academic tendencies to maintain our wiki. He assiduously ranged across the company, asking owners of pages whether the contents were still up to date, whether information was duplicated, complete, no longer needed, and so on.

The result of this was a wiki that was extremely useful, up to date, where content was easily found and minimal time was wasted getting information. The engineers loved it, and went out of their way to praise his efforts to save them from their own bad habits.

Of course, the wiki curator was first to be let go when the next opportunity arose. While everyone on the ground knew of the high value of this in saving lost time and energy chasing around bad information across hundreds of engineers, the impact was difficult or never measured, and in any case, shouldn’t the engineers be doing that themselves?

For years afterwards, whenever we engineers were frustrated with the wiki, we always cursed whoever it was that made the short-sighted decision to let his position go.

So-called ‘business people’, such as shareholders, executives, project managers, and product owners are strongly incentivised to deliver short term, which most often meant prioritise short-term goals (‘mission accomplished’) over longer-term value. Those that don’t think short-term often have a strong background in engineering and have succeeded in retaining their position despite this handicap.

What To Do? Plan A – The Scrum Courtroom

So your superiors don’t often think long term about the work you are assigned, but you take pride in what you do, and want the value of your work to be felt over a longer time than just a sprint or a project increment. And you don’t want people cursing your name as they suffer from your short-term self-serving engineering choices.

Fortunately, a solution has arisen that should handle this difference of horizon: scrum. This methodology (in theory, but that’s a whole other story) strictly defines project work to be done within a regular cadence (eg two weeks). At the start of this cadence (the sprint), the team decides together what items should go in it.

At the beginning of each cadence, therefore, you get a chance to argue the case for the improvement, or investment you want to make in the system you are working on being included in the work cadence.

The problem with this is that these arguments mostly fail because the cards are still stacked against you, in the following ways:

  • The cadence limit
  • Uncertainty of benefit
  • Uncertainty of completion
  • Uncertainty of value

Plan A Mitigators – The Cadence Limit

First, the short-term nature of the scrum cadence has an in-built prejudice against larger-scale and more speculative/innovative ideas. If you can’t get your work done within the cadence, then it’s more easily seen as impractical or of little value.

The usual counter to this is that the work should be ‘broken down’ in advance to smaller chunks that can be completed within the sprint. This often has the effect of making the work seem either profoundly insignificant (‘talk to a young player in the canteen’), and of losing sight of the overall picture of the work being proposed (‘change/maintain the culture of the organisation’).

Plan A Mitigators – Uncertainty of Benefit

The scrum approach tries to increment ‘business value’ in each sprint. Since larger-scale and speculative/innovative work is generally riskier, it’s much harder to ‘prove’ the benefit for the work you do in advance, especially within the sprint cadence.

The result is that such riskier work is less likely to be sanctioned by the scrum ‘court’.

Plan A Mitigators – Uncertainty of Completion

Similarly, there is an uncertainty as to whether the work will get completed within the sprint cadence. Again, this makes the chances of success arguing your case less likely.

Plan A Mitigators – Uncertainty of Value

‘Business Value’ is a very slippery concept the closer you look at it. Mark Schwartz wrote a book I tell everyone to read deconstructing the very term, and showing how no-one really knows what it means. Or, at the very least, it means very different things to different people.

The fact is that almost anything can be justified in terms of business value:

  • Spending a week on an AWS course
    • As an architect, I need to ensure I don’t make bad decisions that will reduce the flow of features for the product
  • Spending a week optimising my dotfiles
    • As a developer, I need to ensure I spend as much time coding efficiently as possible so I can produce more features for the product
  • Tidying up the office
    • As a developer, I want the office to be tidier so I can focus more effectively on writing features for the product
  • Hiring a Michelin starred chef to make lunch
    • As a developer, I need my attention and nutrition to be optimised so I can write more features for the product without being distracted by having to get lunch

The problem with all these things is that they are effectively impossible to measure.

There’s generally no objective way to prove customer value (even if we can be sure what it is). Some arguments just sound rhetorically better to some ears than others.

If you try and justify them within some of business framework (such as improving a defined and pre-approved metric), you get bogged down in discussions that you effectively can’t win.

  • How long will this take you?
    • “I don’t know, I’ve never done this before”
  • What is the metric?
    • “Um, culture points? Can we measure how long we spend scouring the wiki and then chasing up information gleaned from it? [No, it’s too expensive to do that]”

‘Plan A’ Mitigators Do Have Value

All this is not to say that these mitigators should be removed, or have no purpose. Engineers, as any engineer knows, can have a tendency to underestimate how hard something will be to build, how much value it will bring, and even do ‘CV-driven development’ rather than serve the needs of the business.

The same could be said of soccer managers. But we still let soccer managers decide how to spend their time, and more so the more the more experienced they are, and the more success they have demonstrated.


In any case, I have been involved in discussions like this at numerous organisations that end up taking longer than actually doing the work, or at least doing the work necessary to prove the value in proof of concept form.

So I mostly move to Plan B…

What To Do? Plan B – Skunkworks It

Plan B is to skip court and just do the work necessary to be able to convince others that yours is the way to go without telling anyone else. This is broadly known as ‘skunkworks‘.

The first obvious objection to this approach is something like this:

‘How can this be done? Surely the the time taken for work in your sprint has been tightly defined and estimated, and you therefore have no spare time?’

Fortunately this is easily solved. The thing about leaders who don’t have strong domain knowledge is that their ignorance is easily manipulated by those they lead. So the workers simply bloat their estimates, telling them that the easy official tasks they have will take longer than they actually will take, leaving time left over for them to work on the things they think are really important or valuable to the business.

Yes, that’s right: engineers actually spend time they could be spending doing nothing trying to improve things for their business in secret, simply because they want to do the right thing in the best way. Just like Alex Ferguson spent time chatting to juniors, and micromanaging the design of a canteen when he could have enjoyed a longer lunch alone, or with a friend.

Yes, that’s right: engineers actually spend time they could be spending doing nothing trying to improve things for their business in secret, simply because they want to do the right thing in the best way.

It’s Not A Secret

Good leaders know this happens, even encourage it explicitly. A C-level leader (himself a former engineer) once said to me “I love that you hide things from me. I’m not forced to justify to my peers why you’re spending time on improvements if I don’t know about them and just get presented with a solution for free.”

The Argument

When you get paid to make decisions, you are being paid to exercise your judgement exactly in the ways that can’t be justified within easily measurable and well-defined metrics of value.

If your judgement could be quantified and systematised, then there would be no need for you to be there to make those judgements. You’d automate it.

This is true whether you are managing a soccer team, or doing software engineering.

Making software is all about making classes of decision that are the same in shape as Alex Ferguson’s. Should I:

  • Fix or delete the test?
  • Restructure the pipeline because its foundations are wobbly, or just patch it for now?
  • Mentor a junior to complete this work over a few days, or finish the job myself in a couple of hours?
  • Rewrite this bash script in Python, or just add more to it?
  • Spend the time to containerise the application and persuade everyone else to start using Docker, or just muddle along with hand-curated environments as we’ve always done?
  • Spend time getting to know the new joiner on the team, or focus on getting more tickets in the sprint done?

Each of these decisions has many consequences which are unclear and unpredictable. In the end, someone needs to make a decision where to spend the time based on experience as the standard metrics can’t tell you whether they’re a good idea.


At the heart of this problem in software is what I call the ‘bricklayer fallacy’. Many view software engineering as a series of tasks analogous to laying bricks: once you are set up, you can say roughly how long it will take to do something, because laying one brick takes a predictable amount of time.

This fallacy results in the treatment of software engineering as readily convertible to what business leaders want: a predictable graph of delivery over time. Attempts to maintain this illusion for business leaders results in the fairy stories of story points, velocity, and burn-down charts. All of these can drive the real value work underground.

If you want evidence of this not working, look here. Scrum is conspicuously absent as a software methodology at the biggest tech companies. They don’t think of their engineers as bricklayers.

Soccer managers don’t suffer as much from this fallacy because we intuitively understand that building a great soccer team is not like building a brick wall.

But software engineering is also a mysterious and varied art. It’s so full of craft and subtle choices that the satisfaction of doing the job well exceeds the raw compensation for attendance and following the rules. Frequently, I’ve observed that ‘working to rule’ gets the same pay and rewards as ‘pushing to do the right thing for the long term’, but results in real human misery. At a base level, your efforts and their consequences are often not even noticed by anyone.

If you remove this judgement from people, you remove their agency.

This is a strange novelty of knowledge work that didn’t exist in the ‘bricklayer’, or age of piece-work and Taylorism. In the knowledge work era, the engineers who like to actually deliver the work of true long term value get dissatisfied and quit. And paying them more to put with it doesn’t necessarily help, as the ones that stay are the ones that have learned to optimise for getting more money rather than better work. These are exactly the people you don’t want doing the work.

If you want to keep the best and most innovative staff – the ones that will come up with 10x improvements to your workflows that result in significant efficiencies, improvements, and savings – you need to figure out who the Alex Fergusons are, and give them the right level autonomy to deliver for you. That’s your management challenge.

If you enjoyed this, then please consider buying me a coffee to encourage me to do more.

Five Reasons To Master Git On The Command Line

If you spend most of your days in a browser watching pipelines and managing pull requests, you may wonder why anyone would prefer the command line to manage their Git workflow.

I’m here to persuade you that using the command line is better for you. It’s not easier in every case you will find, and it can be harder to learn. But investing the time in building those muscles will reap serious dividends for as long as you use Git.

Here are five (far from exhaustive) reasons why you should become one with Git on the command line.

1. git log is awesome

There are so many ways that git log is awesome I can’t list them all here. I’ve also written about it before.

If you’ve only looked at Git histories through GitHub or BitBucket then it’s unlikely you’ve seen the powerful views of what’s going on with your Git repository.

This is the capsule command that covers most of the flags I use on the regular:

git log --oneline --all --decorate --graph

--oneline – shows a summary per commit in one line, which is essential for seeing what’s going on

--graph – arranges the output into a graph, showing branches and merges. The format can take some time to get used to, especially for complex repositories, but it doesn’t take long for you to get used to

--all – shows all the available branches stored locally

--decorate – shows any reference names

This is what kubernetes looks like when I check it out and run that command:

Most versions of git these days implicitly use --decorate so you won’t necessarily need to remember that flag.

Other arguments that I regularly use with git log include:

--patch – show the changes associated with a commit

--stat – summarise the changes made by file

--simplify-by-decoration – only shows changes that have a reference associated with them. This is particularly useful if you don’t want to see all commits, just ‘significant’ ones associated with branches or tags.

In addition, you have a level of control that the GitBucketLab tools lack when viewing histories. By setting the pager in your ~/.gitconfig file, you can control how the --patch output looks. I like the diff-so-fancy tool. Here’s my config:

        pager = diff-so-fancy | less -RF

The -R argument to less above shows control characters, and -F quits if the output fits in one screen.

If you like this post, you may like my book Learn Git the Hard Way


2. The git add flags

If you’re like me you may have spent years treating additions as a monolith, running git commit -am 'message' to add and commit changes to files already tracked by Git. Sometimes this results in commits that prompt you to write a message that amounts to ‘here is a bunch of stuff I did’.

If so, you may be missing out on the power that git add can give you over what you commit.


git add -i

(or --interactive) gives you an interactive menu that allows you to choose what is to be added to Git’s staging area ready to commit.

Again, this menu takes some getting used to. If you choose a command but don’t want to do anything about it, you can hit return with no data to go back. But sometimes hitting return with no input means you choose the currently selected item (indicated with a ‘*‘). It’s not very intuitive.

Most of the time you will be adding patches. To go direct to that, you can run:

git add -p

Which takes you directly to the patches.

But the real killer command I use regularly is:

git add --edit

which allows you to use your configured editor to decide which changes get added. This is a lot easier than using the interactive menu’s ‘splitting’ and ‘staging hunks’ method.

3. git difftool is handy

If you go full command line, you will be looking at plenty of diffs. Your diff workflow will become a strong subset of your git workflow.

You can use git difftool to control how you see diffs, eg:

git difftool --tool=vimdiff

To get a list of all the available tools, run:

git difftool --tool-help

4. You can use it anywhere

If you rely on a particular GUI, then there is always the danger that that GUI will be unavailable to you at a later point. You might be working on a very locked-down server, or be forced to change OS as part of your job. Ot it may fall out of fashion and you want to try a new one.

Before I saw the light and relied on the command line, I went through many different GUIs for development, including phases of Kate, IntelliJ, Eclipse, even a brief flirtation with Visual Studio. These all have gone in and out of fashion. Git on the command line will be there for as long as Git is used. (So will vi, so will shells, and so will make, by the way).

Similarly, you might get used to a source code site that allows you to rebase with a click. But how do you know what’s really going on? Which brings me to…

It’s closer to the truth

All this leads us to the realisation that the Git command is closer to the truth than a GUI (*), and gives you more flexibility and control.

* The ‘truth’ is obviously in the source code, but there’s also the plumbing/porcelain distinction between Git’s ‘internal’ commands and it’s ‘user-friendly’ commands. But let’s not get into that here: its standard interface can be considered the ‘truth’ for most purposes.

When you’re using git on the command line, you can quickly find out what’s going on now, what happened in the past, the difference between the remote and the local, and you won’t be gnashing your teeth in frustration because the GUI doesn’t give you exactly the information you need, or gives you a limited and opinionated view of the world.

5. You can use ‘git-extras’?

Finally, using Git on the command line means you can make use of git-extras. This commonly-available package contains a whole bunch of useful shortcuts that may help your git workflow. There are too many to list, so I’ve just chosen the ones I use most commonly.

When using many of these, it’s important to understand how it interacts with the remote repositories (if any), whether you need to configure anything to make it work, and whether it affects the history of your repository, making push or pulling potentially problematic. If you want to get a good practical understanding of these things, checkout my Learn Git The Hard Way book.

git fork

Allows you to fork a repository on GitHub. Note that for this to work you’ll need to add a personal access token to your git config under git-extras.github-personal-access-token.

git rename-tag

Renames a tag, locally and remotely. Much easier than doing all this.

git cp

Copies (rather than git mv, which renames) the file, keeping the original’s history.

git undo

Removes the latest commit. You can optionally give a number, which undoes that number of commits.

git obliterate

Uses git filter-branch to destroy all evidence of a file from your Git repo’s history.

git pr / git mr

These allow you to manage pull requests locally. git pr is for GitHub, while git mr is for GitLab.

git sync

Synchronises the history of your local branch with the remote’s version of that branch.

If you like this, you might like one of my books:
Learn Bash the Hard Way

Learn Git the Hard Way
Learn Terraform the Hard Way

Buy in a bundle here

How 3D Printing Kindled A Love For Baroque Sculpture

I originally got a 3D printer in order to indulge a love of architecture. A year ago I wrote about my baby steps with my printer, and after that got busy printing off whatever buildings I could find.

After a few underwhelming prints of whatever buildings I could find, I stumbled on a site called ‘Scan the World’, which contains scans of all sorts of famous artefacts.

Pretty soon I was hooked.

Laocoön and His Sons

The first one I printed was Laocoön and His Sons, and I was even more blown away than I was when I saw it in the Vatican nearly 30 years ago.

While it’s not perfect (my print washing game isn’t quite there yet), what was striking to me was the level of detail the print managed to capture. The curls, the ripples of flesh and muscle, the fingers, they’re all there.

More than this, nothing quite beats being able to look at it close up in your own home. Previously I’d ready art history books and not been terribly impressed or interested in the sculpture I’d looked at pictures of and read about.

This encouraged me to look into the history of this piece, and it turned out to be far more interesting than I’d ever expected.

The sculptor is unknown. It was unearthed in a vineyard in 1506 in several pieces, and the Pope got wind of it, getting an architect and his mate to give it a once-over. His mate was called Michelangelo, and it soon ended up in the Vatican.

It was later nicked by the French in 1798, and returned to the Vatican 19 years later after Napoleon’s Waterloo.

It blows my mind that this sculpture was talked about by Pliny the Elder, carved by Greeks (probably), then left to the worms for a millennium, then dug up and reconstructed. I’m not even sure that historians are sure that the one in the Vatican now was the ‘original’, or whether it was itself a copy of another statue, maybe even in bronze.

Farnese Bull

The next one I printed has a similar history to Laocoön and His Sons. It was dug up during excavations from a Roman bath. Henry Peacham in ‘The Complete Gentleman’ (1634) said that it “outstrippeth all other statues in the world for greatness and workmanship”. Hard to argue with that.

La Pieta

Before Michelangelo had a look at the Laocoön, he carved this statue of the dead Jesus with his mother from a single block of marble. It’s not my favourite, but I couldn’t not print it. It’s worth it for Mary’s expression alone.

Aside from being one of the most famous sculptures ever carved, this sculpture has the distinction of being one of the few works Michelangelo ever signed, apparently because he was pissed off that people thought a rival had done it.

Apollo and Daphne

Then I moved onto the sculptor I have grown to love the most: Bernini. The master.

When I printed this one, I couldn’t stop looking at it for days.

The story behind it makes sense of the foliage: consumed by love (or lust) Apollo is chasing after Daphne, who, not wanting to be pursued because of some magical Greek-mythical reason, prays to be made ugly or for her body to change. So she becomes a tree while she runs. Apollo continues to love the tree.

Amazed by that one, I did another Bernini.

Bernini’s David

I have to call this one “Bernini’s David” to distinguish it from Michelangelo’s which seems to have the monopoly on the name in sculpture terms.

I don’t know why, though, Bernini’s David is almost as breathtaking as Apollo and Daphne. Although – like Michelangelo’s David – the figure is somewhat idealised, this David feels different: A living, breathing, fighter about to unleash his weapon rather than a beautiful boy in repose. Look at how his body is twisted, and his feet are even partially off the base.

My Own Gallery

What excites me about this 3d printing lark is the democratisation of art collection. Twenty years ago the closest I could get to these works was either by going on holiday, traipsing to central London to see copies, or looking at them in (not cheap) books at home.

Now, I can have my own art collection wherever I want, and if they get damaged, I can just print some more off.

If you enjoyed this, then please consider buying me a coffee to encourage me to do more.

grep Flags – The Good Stuff

While writing a post on practical shell patterns I had a couple of patterns that used grep commands.

I had to drop those patterns from the post, though, because as soon as I thought about them I got lost in all the grep flags I wanted to talk about. I realised grep deserved its own post.

grep is one of the most universal and commonly-used commands on the command line. I count about 50 flags you can use on it in my man page. So which ones are the ones you should know about for everyday use?

I tried to find out.

I started by asking on Twitter whether the five flags I have ‘under my fingers’ and use 99% of the time are the ones others also use.

It turns out experience varies widely on what the 5 most-used are.

Here are the results, in no particular order, of my researches.

I’ve tried to categorize them to make it easier to digest. The categories are:

  • ABC – The Context Ones
  • What To Match?
  • What To Report?
  • What To Read?
  • Honourable Mention

If you think any are missing, let me know in the comments below.

ABC – The Context Ones

These arguments give you more context around your match.

grep -A
grep -B
grep -C

I hadn’t included these in my top five, but as soon as I was reminded of them, -C got right back under my fingertips.

I call them the ‘ABC flags’ to help me remember them.

Each of these gives a specified context around your grep’d line. -A gives you lines after the match, -B gives you lines before the match, and -C (for ‘context’) gives you both the before and after lines.

$ mkdir grepflags && cd grepflags
$ cat > afile <<EOF                                                                   
$ grep -A2 c afile
$ grep -B2 c afile
$ grep -C1 c afile

This is especially handy for going through configuration files, where the ‘before’ context can give you useful information about where the thing your matching sits within a wider context.

What To Match?

These flags relate to altering what you match and don’t match with your grep.

grep -i

This flag ignores the case of the match. Very handy and routine for me to use to avoid not missing matches I might want to see (I often grep through large amounts of plain text prose).

$ cat > afile <<EOF
let it all out
$ grep shout afile
$ grep -i shout afile
grep -v

This matches any lines that don’t match the regular expression (inverts), for example:

$ touch
$ IFS=$'\n'   # avoid problems with filenames with spaces in it
$ for f in $(ls | grep -v README)
> do echo "top of: $f"
> head $f
> done
let it all out

which outputs the heads of all files in the local folder except any files with README in their names.

grep -w

The -w flag only matches ‘whole-word’ matches, ignoring cases where submitted words are part of longer words.

This is a useful flag to narrow down your matches, and also especially useful when searching through prose:

$ cat > afile <<EOF
hey jude
$ grep na afile
$ grep -w na afile
$ grep -vwi na afile
hey jude

You might be wondering what characters are considered part of a word. The manual tells us that ‘Word-constituent characters are letters, digits, and the underscore.’ This is useful to know if you’re searching code where word separators in identifiers might switch between dashes and underscore. You can see this above with the na-na-na vs na_na_na differences.

What To Report?

These grep flags offer choices about how the output you see is rendered.

grep -h

grep -h suppresses the prefixing of filenames on output. An example is demonstrated below:

$ rm -f afile
$ cat > afile1 << EOF
$ cp afile1 afile2 
$ grep a *
$ grep -h a *

This is particularly useful if you want to process the matching lines without the filename spoiling the input. Compare these to the output without the -h.

$ grep -h a * | uniq
$ grep -h a * | uniq -c
grep -o

This outputs only the text specified by your regular expression. One match is output per line, but multiple matches may be made per line.

This can result in more matches than lines, as in the example below, where you look for words that end lines that end in ‘ay’, and then any words with the letter ‘e’ in them (but not at the start or the end of the word).

$ rm -f afile1 afile2
$ cat > afile << EOF
All my troubles seemed so far away
Now it looks as though they're here to stay
Oh I believe
In yesterday
$ grep -o ' [^ ]*ay$' afile
$ grep -o ' [^ ]*e[^ ]*' afile
grep -l

If you’re fighting through a blizzard of output and want to focus only on which files your matches are in rather than the matches themselves, then using this flag will show you where you might want to look:

$ cat > afile << EOF
$ cp afile afile2
$ cp afile afile3grep -l
$ grep a *
$ grep -l a *

What To Read?

These flags change which files grep will look at.

grep -r

A very popular flag, this flag recurses through the filesystem looking for matches.

$ grep -r securityagent /etc
grep -I

This one is my favourite, as it’s incredibly useful, and not so well known, despite being widely applicable.

If you’ve ever been desperate to find where a string is referenced in your filesystem (usually as root) and run something like this:

$ grep -rnwi specialconfig /

then you won’t have failed to notice that it can take a good while. This is partly because it’s looking at every file from the root, whether it’s a binary or not.

The -I flag only considers text files. This radically speeds up recursive greps. Here we run the same command twice (to ensure it’s not only slow the first time due to OS file cacheing), then run the command with the extra flag, and see a nearly 50% speedup.

$ time sudo grep -rnwi specialconfig / 2>/dev/null 
sudo grep -rnwi specialconfig /  418.01s user 382.19s system 70% cpu 19:03.09 total
$ time sudo grep -rnwi specialconfig /                                                  sudo grep -rnwi specialconfig /  434.19s user 411.62s system 70% cpu 19:56.25 total
$ time sudo grep -rnwiI specialconfig /                                                sudo grep -rnwiI specialconfig /  33.54s user 322.64s system 52% cpu 11:19.03 total

Honourable mention

There are many other grep flags, but I’ll just add one honourable mention at the end here.

grep -E

I spent an embarrassingly long time trying to get regular expressions with + signs in them to work in grep before I realised that default grep didn’t support so-called ‘extended’ regular expressions.

By using -E you can use those regular expressions just as the regexp gods intended.

$ cat > afile <<< aaaaaa+bc
$ grep -o 'a+b' a            # + is treated literally
$ grep -o 'aa*b' a           # a workaround
$ grep -oE 'a+b' a           # extended regexp

If you like this, you might like one of my books:
Learn Bash the Hard Way

Learn Git the Hard Way
Learn Terraform the Hard Way

Buy in a bundle here

If you enjoyed this, then please consider buying me a coffee to encourage me to do more.

Why It’s Great To Be A Consultant

I spent 20 years slaving away at companies doing development, maintenance, troubleshooting, architecture, management, and whatever else needed doing. For all those years I was a permanent hire, working on technology from within.

I still work at companies doing similar things, but now I’m not a permie. I’m a consultant working for a ‘Cloud Native’ consultancy that focusses on making meaningful changes across organisations rather than just focussing on tech.

I wish I’d made this move this years ago. So here are the reasons why it’s great to be a consultant.

1) You are outside the company hierarchy

Because you come in as an outsider, and are not staying forever, all sorts of baggage that exists for permanent employees does not exist for you.

When you’re an internal employee, you’re often encouraged to ‘stay in your box’ and not disturb the existing hierarchy. By contrast, as a consultant you don’t need to worry as much about the internal politics or history when suggesting changes. Indeed, you’re often encouraged to step outside your boundaries and make suggestions.

This relative freedom can create tensions with permanent employees, as you often have more asking power than the teams you consult for. Hearing feedback of “I have been saying what you’re saying for years, and no-one has listened to me” is not uncommon. It’s not fair but it’s a fact: you both get to tell the truth, and get listened to more when you’re a consultant.

2) You have to keep learning

If you like to learn, consulting is for you. You are continually thrown into new environments, forced to consider new angles on industry vertical, organisational structure, technologies and questions that you may previously have not come across, thought about, or answered. You have to learn fast, and you do.

Frequently, when you are brought in to help a business, you meet the people in the business that are the most knowledgeable and forward-thinking. Meeting with these talented people can be challenging and intimidating, as they may have unrealistically high expectations of your ‘expertise’, and an unrealistically low opinion of their own capabilities.

It’s not unfrequent to wonder why you’ve been brought in when the people you’re meeting already seem to have the capability to do what you are there to help with. When this happens, after some time you go deeper into the business and realise that they need you to help spread the word from outside the hierarchy (see 1, above).

These top permie performers can be used to being ‘top dog’ in their domain, and unwilling to cede ground to people who they’ve not been in the trenches with. You may hear phrases like ‘consulting bullshit’ if you are lucky enough for them to be honest and open with you. However, this group will be your most energetic advocates once you turn them around.

If you can get over the (very human) urge to compete and ‘prove your expertise’, and focus on just helping the client, you can both learn and achieve a lot.

3) You get more exposure

Working within one organisation for a long time can narrow your perspective in various dimensions, such as technology, organisational structure, culture.

To take one vivid example, we recently worked with a ‘household name’ business that brought us in because they felt that their teams were not consistent enough and that they needed some centralisation to make their work more consistent. After many hours of interviews we determined that they were ideally organised to deliver their product in a microservices paradigm. We ended up asking them how they did it!

I was surprised, as I’d never seen a company with this kind of history move from a ‘traditional’ IT org structure to a microservices one, and was skeptical it could be done. This kind of ‘experience-broadening’ work allows you to develop a deeper perspective on any work you end up doing, what’s possible, and how it happens.

And it’s not only at the organisational level that you get to broaden your experience. You get to interact with, and even work for multiple execution teams at different levels of different businesses. You might work with the dreaded ‘devops team’, the more fashionable ‘platform team’, traditional ‘IT teams’, ‘SRE teams’, traditional ‘dev teams’, even ‘exec teams’.

All these experiences give you more confidence and authority when debating decisions or directions: red flags become redder. You’ve got real-world examples to draw on, each of which are worth a thousand theoretical and abstract powerpoint decks, especially when dealing with the middle rank of any organisation you need to get onside. Though you still have to write some of those too (as well as code sometimes too).

4) You meet lots of people

For me, this is the big one.

I’ve probably meaningfully worked with more people in the last two years than the prior ten. Not only are the sheer numbers I’ve worked with greater, they are from more diverse backgrounds, jobs, teams, and specialties.

I’ve got to know talented people working in lots of places and benefitted from their perspectives. And I hope they’ve benefitted from mine too.

Remember, there is no wealth but life.

Join Us

If you want to experience some of the above, get in touch:
Twitter: @ianmiell
email: ian.miell \at\

If you like this, you might like one of my books:
Learn Bash the Hard Way

Learn Git the Hard Way
Learn Terraform the Hard Way

Buy in a bundle here

If you enjoyed this, then please consider buying me a coffee to encourage me to do more.

Practical Shell Patterns I Actually Use

Over the decades I’ve been using the shell, there are thousands of reusable patterns I’ve picked up from looking over others’ shoulders and googling.

Unfortunately, I’ve forgotten about 95% of them.

So here, I list many of the patterns I actually use often enough to be able to remember. If you want to get them under your fingers, your mileage may vary depending on your tastes and what you most commonly use the shell for.

I’m acutely aware that for most of these tips there are better/faster/more elegant ways to achieve the same thing, but that’s not the point here. The point is to reflect on what actually stuck, so that others may save time by spending their time learning what is more likely to stick. I will mention alternative methods and why they didn’t take as we go, as well as theoretical limitations or flaws in the methods I actually use.

I’m going to cover:

  • Get The Last Field From The Output
  • Use sed To Extract
  • ‘Do For All’ with xargs
  • Kill All Processes
  • Find Files Ending With…
  • Process Files With sed | sh
  • Give Me All The Output With 2>&1
  • Separate Lines With tr
  • Quick Infinite Loop
  • Inline Files

Get The Last Field From The Output

$ [commands] | awk '{print $NF}' 

This is what I most commonly use awk for on the command line. I also use it where I might most elegantly use cut, by selecting a specific field with (for example, for the second field) awk '{print $2}' (see below ‘Kill All Processes’).

In the top example, NF stands for ‘number of fields’, which matches the last field (since awk is not zero-indexed). The last field in the command pipeline is commonly a filename, so I often chain this command with xargs to process each file in turn with a new command (see below “‘Do For All’ With xargs“).

You can also use cut for this kind of thing, but I have found that a mixture of awk and sed have sufficed for me to achieve what I want. I do use cut every now and then, though.

Use sed To Extract

When using pipelines, you frequently want to extract a specific part of each line that is output.

My goto command for this is sed, which is well worth investing time in. Before you do that, you have to have a reasonably good understanding of regular expressions, which is even more worth investing time in.

The sed pattern I use most often is the search and replace one (s/FIND/REPLACE/), an example of which is below. This example takes the contents of the /etc/passwd database and outputs the username and default shell for each account on the system:

$ cat /etc/passwd | sed 's/\([^:]*\):.*:\(.*\)/user: \1 shell: \2/'

sed (which is short for ‘stream editor’) can take a filename as an argument, but if none is supplied it assumes it’s receiving lines through standard input.

The first character of the sed script (which is ‘s‘ in the example) indicates the command sed is being given (in bold below), followed by the default separator (which is a forward slash).

s/\([^:]*\):.*:\(.*\)/user: \1 shell: \2/

Then, what follows (up to the next forward slash) is the regular expression pattern to match in each line (in bold below):

 s/\([^:]*\):.*:\(.*\)/user: \1 shell: \2/

Within those toenail clippings, you see two sets of opening and closing parentheses. Each of these is escaped by a backslash (to distinguish them from just matching the parentheses characters as characters):

  • \([^:]*\)
  • \(.*\)

The first one ‘captures’ the username, while the second one ‘captures’ their shell. These are then referenced in the ‘replace’ part of the sed command by their number order:

 s/\([^:]*\):.*:\(.*\)/user: \1 shell: \2/

which produces the output (on my system)…

user: nobody shell: /usr/bin/false
user: root shell: /bin/sh
user: daemon shell: /usr/bin/false

sed definitely requires some effort to learn, but it will quickly repay you if you ever do any text processing.

If you like this post, you may be interested in my book Learn Bash the Hard Way


Preview available here.

‘Do For All’ With xargs

xargs is one of the most powerful and time-saving commands to use on the terminal. But it remains impenetrable to some (just ask Jim below), which is a shame, as with a little work it’s not that difficult to get to grips with.

Before giving a real-world example, let’s go through it with a simple example. Create and move into a folder, creating three files:

$ mkdir xargs_example && cd xargs_example && touch 1 2 3 && ls
1 2 3

Now, by default, xargs takes all the items passed in, and passes them as arguments to the given command:

$ ls | xargs -t ls -l
ls -l 1 2 3
-rw-r--r--  1 imiell  staff  0  3 Jan 11:07 1
-rw-r--r--  1 imiell  staff  0  3 Jan 11:07 2
-rw-r--r--  1 imiell  staff  0  3 Jan 11:07 3

(We are using the -t flag here for explanatory purposes, to show the commands that actually get run; generally, you don’t need it.)

The -n flag allows you to process a number of arguments at once. Try this to see what I mean:

$ ls | xargs -n2 -t ls -l
ls -l 1 2
-rw-r--r--  1 imiell  staff  0  3 Jan 11:07 1
-rw-r--r--  1 imiell  staff  0  3 Jan 11:07 2
ls -l 3
-rw-r--r--  1 imiell  staff  0  3 Jan 11:07 3

Most often, I use -n1, to run the command on each argument separately:

$ ls | xargs -n1 -t ls -l
ls -l 1
-rw-r--r--  1 imiell  staff  0  3 Jan 11:07 1
ls -l 2
-rw-r--r--  1 imiell  staff  0  3 Jan 11:07 2
ls -l 3
-rw-r--r--  1 imiell  staff  0  3 Jan 11:07 3

Here’s a real-world example I used recently:

find . | \
  grep azurerm | \
  grep tf$ | \
  xargs -n1 dirname | \
  sed 's/^.\///'


  • outputs all non-hidden files in or under the current working folder
  • of those files, selects only those files that have azurerm in their name
  • of those files, selects only those that end with tf (eg ./azurerm/
  • for each of those files, strips the full path of the filename of the pathname, resulting in the bare filename, preceded by a dot and forward slash (eg ./
  • for each of those files, removes the leading dot and forward slash, leaving the final bare filename (eg

But what if the argument doesn’t go at the end of the command given to xargs? In that case I use the -I flag, which allows you to replace the arguments that would be applied with a string of your choice. In this example I moved all files with ‘Aug‘ in them to a specific folder:

$ ls | grep Aug | xargs -IXXX mv XXX aug_folder

Be aware that naive use of xargs can lead to problems in scripts. What if your files have spaces in them, or even newlines? What if there are more arguments than can be handled by the command? I’m not going to cover these nuances here, but it’s well covered in this excellent resource for more advanced bash usage.

I also regularly tidy up dodgy filenames with detox on my servers.

Kill All Processes

Now you’ve seen awk and xargs, you can use these to quickly kill all processes that match. I used this quite often to kill off some pesky Virtual Machine processes that sometimes get left over in a corner case and prevent me from running up more:

$ ps -ef | grep VBoxHeadless | awk '{print $2}' | xargs kill -9

Again, you have to be careful with your grep here to ensure that you don’t accidentally kill

Also be careful with the -9 argument to kill. You should only use that when it doesn’t respond to the default kill signal (TERM rather than -9‘s KILL), which allows the process to tidy up after itself if it chooses to.

Find Files Ending With…

I often find myself looking for where files are on my system. The mlocate database is easily installable if you don’t have it, and invaluable for speeding up file lookups using the find command. For example, I often need to find files across the filesystem that end with a specific suffix:

$ sudo updatedb
$ sudo locate cfg | grep \.cfg$

Process Files With sed | sh

Often you want to run a command on a files extracted (or transformed) by a sed command, and with a little tweaking this is easily done by creating a shell script using sed, and then piping it to a shell. This example looks for https links at the start of lines in the file, and opens them up in a browser using the open command available on Macs:

$ grep ^.https | sed 's/^.(h[^])]).*/open \1/' | sh

There are alternate ways to do this with xargs, but I use this when I want to see what the resulting script will actually look like before running it (by leaving off the ‘| sh‘ at the end before running it in).

Give Me All The Output With 2>&1

Some commands separate their output into ‘standard’ output, and ‘error’ output. By default, grep only looks at the ‘standard’ output, and the ‘error’ output is ignored (because it goes to a separate ‘file handle’, but you don’t need to understand that right now).

For example, I was searching for a particular flag in the openssl command recently, and realised that openssl‘s help flag outputs to standard error by default. So adding 2>&1 (which redirects ‘error’ output to wherever the ‘standard’ output is pointed) ensures that the output is grep-able.

$ openssl x509 -h 2>&1 | grep -i common 

If you want to redirect the output to a file, you need to get the ordering right:

$ openssl x509 -h > openssl_help.txt 2>&1.   # RIGHT!

If the file redirect comes after the 2>&1, then the standard error output still goes to the terminal.

$ openssl x509 -h 2>&1 > openssl_help.txt.   # WRONG!

It’s best to think of this by considering that the command is read from left to right, so in the ‘right’ one, the interpreter ‘sees’:

  • Redirect standard output to the file openssl_help.txt, then
  • Redirect standard error to wherever standard output is pointing

and both outputs are pointed at the file. In the ‘wrong’ one:

  • Redirect standard error to wherever standard output is pointing (which at the moment is the terminal), then
  • Redirect standard output to the file openssl_help.txt

and standard error is still pointed at the terminal, while standard output is redirected to the file openssl_help.txt.

Separate Lines With tr

tr is a handy command used in a variety of contexts. Its job is to replace individual characters in a stream of output. While sed can be used for this purpose, tr has a couple of advantages over it in certain contexts:

  • It’s easier to use than sed
  • It’s not line-oriented, so it ‘dumbly’ just replaces characters without concern for line separation

Here’s an example I used it for recently, to get each item in my PATH variable shown, one per line.

$ env | grep -w PATH | tr ':' '\n'

Also, tr is often used to remove problematic characters from a stream of output. For example, to turn a set of lines into a single line, you can run:

$ tr -d '\n'

which removes all the ‘newlines’ from a stream.

Quick Infinite Loop

A pattern you very often need is an infinite loop. This is the way I usually get one on the command line.

$ while true; do … ; done

You can use the break keyword to escape this infinite loop.

Inline Files

Creating files can be a faff, so it’s really useful to be able to just create a file inline on the command line.

You can do this with a ‘heredoc’ like this:

$ cat > afile << SOMESTRING
The contents of the file are written here.
Just keep typing until you are done, then
end with the string you specified at the
end of the first line.

That creates a file called afile with the contents between the first and last line.

You can even go one stage further, and substitute where you would formerly have used the filename using the <() construct (see point 6 here).

$ kubectl apply -f <(cat << EOF

If you like this, you might like one of my books:
Learn Bash the Hard Way

Learn Git the Hard Way
Learn Terraform the Hard Way

Buy in a bundle here

If you enjoyed this, then please consider buying me a coffee to encourage me to do more.

Why I Keep Coming Back to Cynefin

Working as a consultant in helping clients to change the way they work, I often struggle to explain to them how the way they usually attack problems is not always appropriate to the situation they’ve brought us in to help with. They might be a start-up that’s always taken an ad-hoc, JFDI approach that is now struggling with scale up or maturity challenges, or a large corporation used to planning every detail up front before acting who find themselves having to experiment with new capabilities in fast-changing fields.

In these situations, Cynefin is a useful conceptual tool for bringing leaders to the point where they understand that they may need to change their usual approach for this new context.

Cynefin is a meta-framework designed to help managers identify how they perceive situations and make sense of their own and other people’s behaviour.

But first, here’s how it helped me understand where I’d been going wrong.

How Cynefin helped me

I wrote about writing runbooks and doing SRE in a real-world context in a previous post. The post was well-received, and I figured I had this nailed. However, I learned something deeper and more about knowledge management and decision making with a client a couple of years after that when I failed to make the same techniques work in a different context.

The context of the original post was a company with a 15 year old software stack and prior examples of incidents and their histories of resolution were available to put together in a more coherent and regularised form. Even though the situation was crying out for runbooks, no-one had put them together, and getting to the point where they became part of the fabric of our work was still a long, costly and arduous one.

I found myself in a different situation some time later. I was in a team that was trying to deliver a new software delivery platform against a dynamic technical background. While there, I tried to foster a culture of writing runbooks to cover situations we’d seen in development, but I completely failed to make it stick. The reasons for this failure were numerous, but probably the most significant was that while I’d learned my lessons in a stable, ‘best practice’ context suitable for runbooks, the situation I was in was one of ’emergent practice’ where things were changing so fast that the documents were either out of date or redundant almost as soon as they were written.

While talking to a friend about this learning experience, they mentioned Cynefin (pronounced /kəˈnɛvɪn/, or kuh-NEV-in) to me as a framework for thinking about this kind of situation.

What is Cynefin?

A great overview of the Cynefin framework, from

A great way to introduce Cynefin is to consider this often-mocked quote from Donald Rumsfeld.

“As we know, there are known knowns, there are things we know we know. We also know there are known unknowns, that is to say, we know there are some things we do not know. But there are also unknown unknowns – the ones we don’t know we don’t know.”

Donald Rumsfeld

I never really understood why it was mocked so much – it clearly states an important point about the nature of knowledge and ignorance in different contexts. The fact that it requires some concentration to follow should not have made it a cause of mockery. But hey.

Cynefin does something similar for business decision-making. It categorises the various types of context we might find ourselves in, and helps us orient ourselves within them. This categorisation helps us to adapt our behaviour appropriately to the context.

What are these categorised contexts? There are five:

  • Simple
  • Complicated
  • Complex
  • Chaotic

The fifth state (Disordered) just means we haven’t categorised the context yet.

1) The ‘simple’ context

Let’s take the two extremes first. The ‘simple’ context might also be called the ‘known known’, or ‘best practice’ context. It’s the context where any reasonable person can work out what to do if they know the domain. For example, if you’re an airline pilot and a light in your cabin flashes, there is a documented checklist to follow that describes what you need to do. If you’re a trained pilot, what to do is well-understood and consistently applicable. It’s a ‘simple’ context.

2) The ‘chaotic’ context

By contrast, the ‘chaotic’ context is one where no-one knows what to do. In this situation. I like to think of an improv comic. They are placed in situations where there is no ‘right way’ forward; by design, they are thrown into an unfamiliar context where they are forced to be creative and experiment. If there were a ‘best practice’ here, it just wouldn’t be funny. What is called for is ‘novel practice’it’s vital to do something, see what happens and respond to what happens next.

So, an approach seeking to find ‘best practice’ is not always best practice…


Cynefin proposes that there are different patterns of decision-making in these different contexts. All of them end in ‘respond’ (which seems to just mean ‘take action’), but are preceded by different approaches.

For ‘simple’, the steps are: sense, categorise, respond. In other words, figure out what the state is, which category of situation this is, and act accordingly.

For ‘chaotic’, the steps are: act, sense, respond. In other words, when you don’t know what the right thing to do is (or even where to start) just do something, and see what happens.

You can see how this maps to situations we’ve seen in different contexts. If you’re running an Accident and Emergency department, you seek to apply known best practice with every patient that comes in. So you sense (detect patient has come in), categorise (triage them), and respond (schedule them for appropriate and timely treatment. That’s a ‘simple’ (though not ‘easy’) situation.

If you’re fleeing from persecution in a war-torn country, then it’s by no means clear what the right thing to do is. You don’t have time to sit and think, and you don’t have enough information to evaluate a best path. Even if you did have information, the situation is changing rapidly and in unpredictable ways. In such a situation, just picking something to do (eg run to the airport), and re-evaluating your situation at the next appropriate point is the best thing to do. So you act (do something), sense (re-evaluate the situation), and respond (decide what to do next).

The other two categories

Between the two extremes of ‘simple’ and ‘chaotic’ are two more states: ‘complicated’ and ‘complex’.

3) The ‘complicated’ context

The ‘complicated’ situation is one where there is ‘good practice’ (ie there is a ‘good’ – but maybe not ‘best’ – ‘answer’ to the question posed by the situation of what to do), but it requires an expert to analyse the situation first. One might think of an architect called in to design a building. There are site- and client-specific things to consider in a broader context requiring expertise to know how to proceed in a ‘good practice’ way.

In these complicated situations, you ‘sense’ (gather relevant information), ‘analyse’ (work out, using your expertise, what a good solution looks like), then ‘respond’ (proceed with the build).

4) The ‘complex’ context

‘Complex’ sits between ‘complicated’ and ‘chaotic’. It’s not complete chaos, as there is some prior knowledge or experience that can be brought to bear, but even knowing how to get to a good answer is unknown. In these situations, you need to figure out the best way by experimentation. This is called ’emergent practice’, and is appropriately handled by a ‘probe’, ‘sense’, ‘respond‘ decision-making process.

Working in Cloud Native technology transformation, our consultancy often works with business used to working in the complicated or simple areas, and find that our job involves helping them understand that the inapplicability of one previously-working approach with their current one of ’emergent practice’.

By contrast, when we work with those already operating in the complex space, we generally augment their teams with our technologists, who have more experience of getting towards good practice in Cloud Native than they do. If they are in a chaotic state, then we can use our experience to help them guide their decision making towards the complex or complicated space.

Why is Cynefin such a powerful consulting tool?

So far this all might sound quite trivial. It’s obvious to any reasonable person that your decision-making process needs to be different if you’re fleeing from a war zone than if you’re flying a plane.

What makes Cynefin such a powerful tool for business consulting is that it gives a framework to the common management problem: ‘What got your business here won’t get you there’

If you’ve read the book of the same title, you’ll know that its key message is that as you age in your career, the more self-centred and driven approaches to success that worked for you when younger become less effective as you seek to lead others to collective success. You need to change your approach and whole attitude to work if you want to succeed in your new, more elevated context.

In an analogous way, the success your business had in its earlier states may be completely inappropriate in a new state.

Imagine a startup used to operating in a chaotic or complex context. They get used to ‘just doing it’ and giving their staff freedom to solve problems in new and creative ways. This works for them really well as they grow fast, but after some years they find that their offerings to consumers has matured, and novel and creative solutions result more and more in waste, disruption (of the bad kind) and inefficiency.

What they need at this point might be staff who take a different approach more aligned to ‘good’ or ‘best’ practice. However, when people advocating such approaches arrive, challenging group dynamics can come into play. The tight monoculture that’s worked in the past becomes difficult to question, and those advocating a different style get alienated and leave, or just conform to the prevailing approach.

Similarly, a company used to using ‘best practice’ to solve their challenges may be at a loss as to why their approach does not work . As a consultant, I’m often asked directly “what’s the right answer to this question?” to which the answer is all-too-often a disappointing “it depends”. What’s happened there is usually that they are seeking a ‘best’ or ‘good practice’ answer to one where an ’emergent’ or ‘novel practice’ one is called for.

Many of these patterns are documented here on our transformation patterns website, where we (for example) talk about:

And for you?

It’s not just consulting or work that Cynefin can help with. It’s also worth considering whether you have a preference for a particular way of approaching problems, and whether this preference stops you from acting in an appropriate way.

My wife and I often clash over whether to plan or not: she likes to plan holidays in advance, for example, and I find that process like pulling teeth, preferring to improvise. There are situations where planning is absolutely essential (got kids? you’d better plan activities!), and other more dynamic situations where time spent planning in detail is wasted as circumstances change (don’t book that restaurant weeks in advance, we may change our minds about where we’re going if the weather is bad).

Of course, in these situations, my approach is always best, and I don’t need no decision-making meta-framework to help me.

If you enjoyed this, then please consider buying me a coffee to encourage me to do more.