Practical Shell Patterns I Actually Use

Over the decades I’ve been using the shell, there are thousands of reusable patterns I’ve picked up from looking over others’ shoulders and googling.

Unfortunately, I’ve forgotten about 95% of them.

So here, I list many of the patterns I actually use often enough to be able to remember. If you want to get them under your fingers, your mileage may vary depending on your tastes and what you most commonly use the shell for.

I’m acutely aware that for most of these tips there are better/faster/more elegant ways to achieve the same thing, but that’s not the point here. The point is to reflect on what actually stuck, so that others may save time by spending their time learning what is more likely to stick. I will mention alternative methods and why they didn’t take as we go, as well as theoretical limitations or flaws in the methods I actually use.

I’m going to cover:

  • Get The Last Field From The Output
  • Use sed To Extract
  • ‘Do For All’ with xargs
  • Kill All Processes
  • Find Files Ending With…
  • Process Files With sed | sh
  • Give Me All The Output With 2>&1
  • Separate Lines With tr
  • Quick Infinite Loop
  • Inline Files

Get The Last Field From The Output

$ [commands] | awk '{print $NF}' 

This is what I most commonly use awk for on the command line. I also use it where I might most elegantly use cut, by selecting a specific field with (for example, for the second field) awk '{print $2}' (see below ‘Kill All Processes’).

In the top example, NF stands for ‘number of fields’, which matches the last field (since awk is not zero-indexed). The last field in the command pipeline is commonly a filename, so I often chain this command with xargs to process each file in turn with a new command (see below “‘Do For All’ With xargs“).

You can also use cut for this kind of thing, but I have found that a mixture of awk and sed have sufficed for me to achieve what I want. I do use cut every now and then, though.

Use sed To Extract

When using pipelines, you frequently want to extract a specific part of each line that is output.

My goto command for this is sed, which is well worth investing time in. Before you do that, you have to have a reasonably good understanding of regular expressions, which is even more worth investing time in.

The sed pattern I use most often is the search and replace one (s/FIND/REPLACE/), an example of which is below. This example takes the contents of the /etc/passwd database and outputs the username and default shell for each account on the system:

$ cat /etc/passwd | sed 's/\([^:]*\):.*:\(.*\)/user: \1 shell: \2/'

sed (which is short for ‘stream editor’) can take a filename as an argument, but if none is supplied it assumes it’s receiving lines through standard input.

The first character of the sed script (which is ‘s‘ in the example) indicates the command sed is being given (in bold below), followed by the default separator (which is a forward slash).

s/\([^:]*\):.*:\(.*\)/user: \1 shell: \2/

Then, what follows (up to the next forward slash) is the regular expression pattern to match in each line (in bold below):

 s/\([^:]*\):.*:\(.*\)/user: \1 shell: \2/

Within those toenail clippings, you see two sets of opening and closing parentheses. Each of these is escaped by a backslash (to distinguish them from just matching the parentheses characters as characters):

  • \([^:]*\)
  • \(.*\)

The first one ‘captures’ the username, while the second one ‘captures’ their shell. These are then referenced in the ‘replace’ part of the sed command by their number order:

 s/\([^:]*\):.*:\(.*\)/user: \1 shell: \2/

which produces the output (on my system)…

user: nobody shell: /usr/bin/false
user: root shell: /bin/sh
user: daemon shell: /usr/bin/false
[...]

sed definitely requires some effort to learn, but it will quickly repay you if you ever do any text processing.


If you like this post, you may be interested in my book Learn Bash the Hard Way

hero

Preview available here.


‘Do For All’ With xargs

xargs is one of the most powerful and time-saving commands to use on the terminal. But it remains impenetrable to some (just ask Jim below), which is a shame, as with a little work it’s not that difficult to get to grips with.

Before giving a real-world example, let’s go through it with a simple example. Create and move into a folder, creating three files:

$ mkdir xargs_example && cd xargs_example && touch 1 2 3 && ls
1 2 3

Now, by default, xargs takes all the items passed in, and passes them as arguments to the given command:

$ ls | xargs -t ls -l
ls -l 1 2 3
-rw-r--r--  1 imiell  staff  0  3 Jan 11:07 1
-rw-r--r--  1 imiell  staff  0  3 Jan 11:07 2
-rw-r--r--  1 imiell  staff  0  3 Jan 11:07 3

(We are using the -t flag here for explanatory purposes, to show the commands that actually get run; generally, you don’t need it.)

The -n flag allows you to process a number of arguments at once. Try this to see what I mean:

$ ls | xargs -n2 -t ls -l
ls -l 1 2
-rw-r--r--  1 imiell  staff  0  3 Jan 11:07 1
-rw-r--r--  1 imiell  staff  0  3 Jan 11:07 2
ls -l 3
-rw-r--r--  1 imiell  staff  0  3 Jan 11:07 3

Most often, I use -n1, to run the command on each argument separately:

$ ls | xargs -n1 -t ls -l
ls -l 1
-rw-r--r--  1 imiell  staff  0  3 Jan 11:07 1
ls -l 2
-rw-r--r--  1 imiell  staff  0  3 Jan 11:07 2
ls -l 3
-rw-r--r--  1 imiell  staff  0  3 Jan 11:07 3

Here’s a real-world example I used recently:

find . | \
  grep azurerm | \
  grep tf$ | \
  xargs -n1 dirname | \
  sed 's/^.\///'

It:

  • outputs all non-hidden files in or under the current working folder
  • of those files, selects only those files that have azurerm in their name
  • of those files, selects only those that end with tf (eg ./azurerm/somefile.tf)
  • for each of those files, strips the full path of the filename of the pathname, resulting in the bare filename, preceded by a dot and forward slash (eg ./somefile.tf)
  • for each of those files, removes the leading dot and forward slash, leaving the final bare filename (eg somefile.tf)

But what if the argument doesn’t go at the end of the command given to xargs? In that case I use the -I flag, which allows you to replace the arguments that would be applied with a string of your choice. In this example I moved all files with ‘Aug‘ in them to a specific folder:

$ ls | grep Aug | xargs -IXXX mv XXX aug_folder

Be aware that naive use of xargs can lead to problems in scripts. What if your files have spaces in them, or even newlines? What if there are more arguments than can be handled by the command? I’m not going to cover these nuances here, but it’s well covered in this excellent resource for more advanced bash usage.

I also regularly tidy up dodgy filenames with detox on my servers.

Kill All Processes

Now you’ve seen awk and xargs, you can use these to quickly kill all processes that match. I used this quite often to kill off some pesky Virtual Machine processes that sometimes get left over in a corner case and prevent me from running up more:

$ ps -ef | grep VBoxHeadless | awk '{print $2}' | xargs kill -9

Again, you have to be careful with your grep here to ensure that you don’t accidentally kill

Also be careful with the -9 argument to kill. You should only use that when it doesn’t respond to the default kill signal (TERM rather than -9‘s KILL), which allows the process to tidy up after itself if it chooses to.

Find Files Ending With…

I often find myself looking for where files are on my system. The mlocate database is easily installable if you don’t have it, and invaluable for speeding up file lookups using the find command. For example, I often need to find files across the filesystem that end with a specific suffix:

$ sudo updatedb
$ sudo locate cfg | grep \.cfg$

Process Files With sed | sh

Often you want to run a command on a files extracted (or transformed) by a sed command, and with a little tweaking this is easily done by creating a shell script using sed, and then piping it to a shell. This example looks for https links at the start of lines in the doc.md file, and opens them up in a browser using the open command available on Macs:

$ grep ^.https doc.md | sed 's/^.(h[^])]).*/open \1/' | sh

There are alternate ways to do this with xargs, but I use this when I want to see what the resulting script will actually look like before running it (by leaving off the ‘| sh‘ at the end before running it in).

Give Me All The Output With 2>&1

Some commands separate their output into ‘standard’ output, and ‘error’ output. By default, grep only looks at the ‘standard’ output, and the ‘error’ output is ignored (because it goes to a separate ‘file handle’, but you don’t need to understand that right now).

For example, I was searching for a particular flag in the openssl command recently, and realised that openssl‘s help flag outputs to standard error by default. So adding 2>&1 (which redirects ‘error’ output to wherever the ‘standard’ output is pointed) ensures that the output is grep-able.

$ openssl x509 -h 2>&1 | grep -i common 

If you want to redirect the output to a file, you need to get the ordering right:

$ openssl x509 -h > openssl_help.txt 2>&1.   # RIGHT!

If the file redirect comes after the 2>&1, then the standard error output still goes to the terminal.

$ openssl x509 -h 2>&1 > openssl_help.txt.   # WRONG!

It’s best to think of this by considering that the command is read from left to right, so in the ‘right’ one, the interpreter ‘sees’:

  • Redirect standard output to the file openssl_help.txt, then
  • Redirect standard error to wherever standard output is pointing

and both outputs are pointed at the file. In the ‘wrong’ one:

  • Redirect standard error to wherever standard output is pointing (which at the moment is the terminal), then
  • Redirect standard output to the file openssl_help.txt

and standard error is still pointed at the terminal, while standard output is redirected to the file openssl_help.txt.

Separate Lines With tr

tr is a handy command used in a variety of contexts. Its job is to replace individual characters in a stream of output. While sed can be used for this purpose, tr has a couple of advantages over it in certain contexts:

  • It’s easier to use than sed
  • It’s not line-oriented, so it ‘dumbly’ just replaces characters without concern for line separation

Here’s an example I used it for recently, to get each item in my PATH variable shown, one per line.

$ env | grep -w PATH | tr ':' '\n'
PATH=/usr/local/sbin
/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/bin
/usr/local/opt/coreutils/bin
/usr/local/opt/grep/libexec/gnubin

Also, tr is often used to remove problematic characters from a stream of output. For example, to turn a set of lines into a single line, you can run:

$ tr -d '\n'

which removes all the ‘newlines’ from a stream.

Quick Infinite Loop

A pattern you very often need is an infinite loop. This is the way I usually get one on the command line.

$ while true; do … ; done

You can use the break keyword to escape this infinite loop.

Inline Files

Creating files can be a faff, so it’s really useful to be able to just create a file inline on the command line.

You can do this with a ‘heredoc’ like this:

$ cat > afile << SOMESTRING
The contents of the file are written here.
Just keep typing until you are done, then
end with the string you specified at the
end of the first line.
SOMESTRING

That creates a file called afile with the contents between the first and last line.

You can even go one stage further, and substitute where you would formerly have used the filename using the <() construct (see point 6 here).

$ kubectl apply -f <(cat << EOF
…
EOF
)

If you like this, you might like one of my books:
Learn Bash the Hard Way

Learn Git the Hard Way
Learn Terraform the Hard Way

LearnGitBashandTerraformtheHardWay
Buy in a bundle here

If you enjoyed this, then please consider buying me a coffee to encourage me to do more.

Why I Keep Coming Back to Cynefin

Working as a consultant in helping clients to change the way they work, I often struggle to explain to them how the way they usually attack problems is not always appropriate to the situation they’ve brought us in to help with. They might be a start-up that’s always taken an ad-hoc, JFDI approach that is now struggling with scale up or maturity challenges, or a large corporation used to planning every detail up front before acting who find themselves having to experiment with new capabilities in fast-changing fields.

In these situations, Cynefin is a useful conceptual tool for bringing leaders to the point where they understand that they may need to change their usual approach for this new context.

Cynefin is a meta-framework designed to help managers identify how they perceive situations and make sense of their own and other people’s behaviour.

But first, here’s how it helped me understand where I’d been going wrong.

How Cynefin helped me

I wrote about writing runbooks and doing SRE in a real-world context in a previous post. The post was well-received, and I figured I had this nailed. However, I learned something deeper and more about knowledge management and decision making with a client a couple of years after that when I failed to make the same techniques work in a different context.

The context of the original post was a company with a 15 year old software stack and prior examples of incidents and their histories of resolution were available to put together in a more coherent and regularised form. Even though the situation was crying out for runbooks, no-one had put them together, and getting to the point where they became part of the fabric of our work was still a long, costly and arduous one.

I found myself in a different situation some time later. I was in a team that was trying to deliver a new software delivery platform against a dynamic technical background. While there, I tried to foster a culture of writing runbooks to cover situations we’d seen in development, but I completely failed to make it stick. The reasons for this failure were numerous, but probably the most significant was that while I’d learned my lessons in a stable, ‘best practice’ context suitable for runbooks, the situation I was in was one of ’emergent practice’ where things were changing so fast that the documents were either out of date or redundant almost as soon as they were written.

While talking to a friend about this learning experience, they mentioned Cynefin (pronounced /kəˈnɛvɪn/, or kuh-NEV-in) to me as a framework for thinking about this kind of situation.

What is Cynefin?

A great overview of the Cynefin framework, from https://sketchingmaniacs.com/decision-making/

A great way to introduce Cynefin is to consider this often-mocked quote from Donald Rumsfeld.

“As we know, there are known knowns, there are things we know we know. We also know there are known unknowns, that is to say, we know there are some things we do not know. But there are also unknown unknowns – the ones we don’t know we don’t know.”

Donald Rumsfeld

I never really understood why it was mocked so much – it clearly states an important point about the nature of knowledge and ignorance in different contexts. The fact that it requires some concentration to follow should not have made it a cause of mockery. But hey.

Cynefin does something similar for business decision-making. It categorises the various types of context we might find ourselves in, and helps us orient ourselves within them. This categorisation helps us to adapt our behaviour appropriately to the context.

What are these categorised contexts? There are five:

  • Simple
  • Complicated
  • Complex
  • Chaotic

The fifth state (Disordered) just means we haven’t categorised the context yet.

1) The ‘simple’ context

Let’s take the two extremes first. The ‘simple’ context might also be called the ‘known known’, or ‘best practice’ context. It’s the context where any reasonable person can work out what to do if they know the domain. For example, if you’re an airline pilot and a light in your cabin flashes, there is a documented checklist to follow that describes what you need to do. If you’re a trained pilot, what to do is well-understood and consistently applicable. It’s a ‘simple’ context.

2) The ‘chaotic’ context

By contrast, the ‘chaotic’ context is one where no-one knows what to do. In this situation. I like to think of an improv comic. They are placed in situations where there is no ‘right way’ forward; by design, they are thrown into an unfamiliar context where they are forced to be creative and experiment. If there were a ‘best practice’ here, it just wouldn’t be funny. What is called for is ‘novel practice’it’s vital to do something, see what happens and respond to what happens next.

So, an approach seeking to find ‘best practice’ is not always best practice…

Decision-making

Cynefin proposes that there are different patterns of decision-making in these different contexts. All of them end in ‘respond’ (which seems to just mean ‘take action’), but are preceded by different approaches.

For ‘simple’, the steps are: sense, categorise, respond. In other words, figure out what the state is, which category of situation this is, and act accordingly.

For ‘chaotic’, the steps are: act, sense, respond. In other words, when you don’t know what the right thing to do is (or even where to start) just do something, and see what happens.

You can see how this maps to situations we’ve seen in different contexts. If you’re running an Accident and Emergency department, you seek to apply known best practice with every patient that comes in. So you sense (detect patient has come in), categorise (triage them), and respond (schedule them for appropriate and timely treatment. That’s a ‘simple’ (though not ‘easy’) situation.

If you’re fleeing from persecution in a war-torn country, then it’s by no means clear what the right thing to do is. You don’t have time to sit and think, and you don’t have enough information to evaluate a best path. Even if you did have information, the situation is changing rapidly and in unpredictable ways. In such a situation, just picking something to do (eg run to the airport), and re-evaluating your situation at the next appropriate point is the best thing to do. So you act (do something), sense (re-evaluate the situation), and respond (decide what to do next).

The other two categories

Between the two extremes of ‘simple’ and ‘chaotic’ are two more states: ‘complicated’ and ‘complex’.

3) The ‘complicated’ context

The ‘complicated’ situation is one where there is ‘good practice’ (ie there is a ‘good’ – but maybe not ‘best’ – ‘answer’ to the question posed by the situation of what to do), but it requires an expert to analyse the situation first. One might think of an architect called in to design a building. There are site- and client-specific things to consider in a broader context requiring expertise to know how to proceed in a ‘good practice’ way.

In these complicated situations, you ‘sense’ (gather relevant information), ‘analyse’ (work out, using your expertise, what a good solution looks like), then ‘respond’ (proceed with the build).

4) The ‘complex’ context

‘Complex’ sits between ‘complicated’ and ‘chaotic’. It’s not complete chaos, as there is some prior knowledge or experience that can be brought to bear, but even knowing how to get to a good answer is unknown. In these situations, you need to figure out the best way by experimentation. This is called ’emergent practice’, and is appropriately handled by a ‘probe’, ‘sense’, ‘respond‘ decision-making process.

Working in Cloud Native technology transformation, our consultancy often works with business used to working in the complicated or simple areas, and find that our job involves helping them understand that the inapplicability of one previously-working approach with their current one of ’emergent practice’.

By contrast, when we work with those already operating in the complex space, we generally augment their teams with our technologists, who have more experience of getting towards good practice in Cloud Native than they do. If they are in a chaotic state, then we can use our experience to help them guide their decision making towards the complex or complicated space.

Why is Cynefin such a powerful consulting tool?

So far this all might sound quite trivial. It’s obvious to any reasonable person that your decision-making process needs to be different if you’re fleeing from a war zone than if you’re flying a plane.

What makes Cynefin such a powerful tool for business consulting is that it gives a framework to the common management problem: ‘What got your business here won’t get you there’

If you’ve read the book of the same title, you’ll know that its key message is that as you age in your career, the more self-centred and driven approaches to success that worked for you when younger become less effective as you seek to lead others to collective success. You need to change your approach and whole attitude to work if you want to succeed in your new, more elevated context.

In an analogous way, the success your business had in its earlier states may be completely inappropriate in a new state.

Imagine a startup used to operating in a chaotic or complex context. They get used to ‘just doing it’ and giving their staff freedom to solve problems in new and creative ways. This works for them really well as they grow fast, but after some years they find that their offerings to consumers has matured, and novel and creative solutions result more and more in waste, disruption (of the bad kind) and inefficiency.

What they need at this point might be staff who take a different approach more aligned to ‘good’ or ‘best’ practice. However, when people advocating such approaches arrive, challenging group dynamics can come into play. The tight monoculture that’s worked in the past becomes difficult to question, and those advocating a different style get alienated and leave, or just conform to the prevailing approach.

Similarly, a company used to using ‘best practice’ to solve their challenges may be at a loss as to why their approach does not work . As a consultant, I’m often asked directly “what’s the right answer to this question?” to which the answer is all-too-often a disappointing “it depends”. What’s happened there is usually that they are seeking a ‘best’ or ‘good practice’ answer to one where an ’emergent’ or ‘novel practice’ one is called for.

Many of these patterns are documented here on our transformation patterns website, where we (for example) talk about:

And for you?

It’s not just consulting or work that Cynefin can help with. It’s also worth considering whether you have a preference for a particular way of approaching problems, and whether this preference stops you from acting in an appropriate way.

My wife and I often clash over whether to plan or not: she likes to plan holidays in advance, for example, and I find that process like pulling teeth, preferring to improvise. There are situations where planning is absolutely essential (got kids? you’d better plan activities!), and other more dynamic situations where time spent planning in detail is wasted as circumstances change (don’t book that restaurant weeks in advance, we may change our minds about where we’re going if the weather is bad).

Of course, in these situations, my approach is always best, and I don’t need no decision-making meta-framework to help me.


If you enjoyed this, then please consider buying me a coffee to encourage me to do more.



Is Agility Related to Commitment? – Money Flows Part II

Previously, I wrote about how software companies cultural challenges can be traced back to how money flows through it, using the example of an ‘accidental product’ B2B type of business that tries to go from project to product.

In this article, I want to extend that ‘cultural economics’ thinking to the subject of Agility.

An Agile Anecdote

Around 2005, when teams calling themselves agile was the new fashion (and few really knew what it meant) I worked for a company that delivered software to the sports betting industry. The industry is notoriously unregulated and cut-throat, and its combatants were (and probably still are) in a constant arms race to outdo each other for new features to attract and keep its non-sticky customers.

The pressure to deliver fast is exacerbated by another factor: major events don’t move. If you have a great new soccer bet type then it needs to be ready for the World Cup, where a huge number of bets will be placed. Afterwards, it’s effectively commercially worthless. So if a feature isn’t quite ready, even if it’s slightly flawed, you rush it out and try to deal with the fallout later.

In this context, we had a job to do for a client by a deadline and didn’t have the staff to do it. Skills in this area were rare but we found a company ready and seemingly able to do the work. When we sat down with them, they told us they liked to work ‘in an agile way’. We didn’t really know what they meant by this, and ploughed ahead. We were desperate.

We soon hit a big problem. They ended their first sprint, and said they would have to re-evaluate the timelines as they had, in true agile fashion, reflected on their experience and decided it would take longer to deliver than they thought. We explained there was an immovable deadline, and that this was a non-negotiable constraint. They said they were sorry, but they were agile, so that was the deal.

We fired them pretty quickly after that, and managed to do the work in-house. We made a few glib remarks about their lack of agility about their agile approach and moved on.

Does Being Agile Even Make Sense?

Reflecting on this war story the other week while thinking about a consulting client’s challenges, I was prompted to ask whether aiming for true Agility is the right thing.

In software now it can feel like an unquestionable dogma that being Agile is always good, but it’s worth highlighting the situations in which trying to be agile is like fitting a square peg in a round hole.

Before we do that, I want to outline what I think the ideal conditions for agility are, and then we can talk about the common conditions for not being agile.


What is Agility?

I don’t really want to get into a theological discussion of what agility truly is, so I’m going to refer to the original Manifesto only rather than any of its descendants, and here just emphasise the part that says ‘Responding to change over following a plan’


Conditions For Agility

The Agile Manifesto’s original themes emphasise internal team agency over external demands.

Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan

https://agilemanifesto.org/

Each of the points ‘valued more’ on the left are those that privilege team autonomy. The team gets to decide when the thing gets built, how it gets built, and even what the thing they’re building consists of. Because the customer can’t be ignored, they are brought into the team rather than faced down over a contract.

The conditions required for team autonomy, and therefore for Agile, can be summed up in one word: trust.

All the values prized in the Agile Manifesto require trust to be nurtured and sustained.

In low-trust environments:

  • Process and tools are centrally mandated
  • Documentation is centrally mandated
  • Collaboration is low – demands are made of teams in a transactional fashion
  • Plans are king. Deviation from the Plan is always a problem rather than a response.

In high-trust environments, all these aspects of building are decided within the team.

Behind this trust must lie either one of two relationships between the builders and those who ultimately provide the money (ie the customer). Either:

  • The customer is embedded within the team, or
  • The customer has only an indirect voice on the team’s performance

Implicit in this is a trust model between patron and producer where there are no hard commitments to delivery.

Back to the Iron Triangle

So far we’ve related Agility to trusting the delivery team to deliver, and this means that the customer has to accept that when, how and what will be delivered will change during the project. In other words, the team makes no binding commitment to delivery.

Customers might accept this in two situations: where they are directly involved in this dynamic decision-making, or if the nature of the commitment made to them does not involve what is delivered to them. If Netflix don’t deliver a ‘favourites’ feature on their SmartTV interface by February I can’t sue them – I just stop paying them. It’s up to them what they deliver to me, and up to me whether I pay them.

Either way, the nature of the commitment made to the patron is central to whether Agility is possible. And when we talk about commitments in projects, we talk about the iron triangle.

This old project management saw delineates the areas that one can make commitments to in a project. In other words, these are the things that are usually in contracts: when (time), what (scope), and how much it will cost. All other things being equal, changing the size of any of these areas will affect the quality of the delivery.

Is There A Contract Behind Your Work?

The first question you must ask yourself when thinking about whether it makes sense to try to be Agile is to ask whether, at the root of your efforts, is a binding commitment to deliver on those three points to whoever is ultimately funding you?

Flowing from this question will be all the behaviours that mitigate against the potentially innovative or thinking that Agility exists to enable:

  • ‘You can’t take longer to deliver this, so you will have to make choices that affect the quality of the delivery’
  • ‘You can’t reduce the scope delivered in the time, you will have to make choices ‘
  • ‘We can’t add more resources to deliver the project (even if that would help), because this would make it unprofitable.’
  • ‘We have to co-ordinate everyone to deliver a specific thing at a specific time, so we need to plan ahead in a waterfall way (though we may call it Agile).’

Binding contracts also prevent the slack that all creative or innovative teams require to work for the longer-term interests of the producing business.

Are You Mining, Or Prospecting?

Another way to look at this question of whether it makes sense to be Agile is to consider whether your work is ultimately thought of as ‘mining’ or ‘prospecting’.

When you mine, you’ve found the value in the ground, and you know how to extract it. The task is therefore simple:

  • Get the value out of the ground
  • As cheaply as possible

When you prospect, you don’t know where the value is, so you have to find it as efficiently as possible. So you have to:

  • Decide where to start looking
  • Search in that area
  • Reflect on what you’ve found
  • Decide where to look next, or start mining

This is Agility in a nutshell – we have some expertise, but we don’t know all the answers, so we will have to prospect for value. We may discover new things along the way and change the outcome through innovative reflection on what we learn as we go. We can’t make commitments, we may strike gold, or nothing at all.

Implicit in a fixed contract is that the work you are doing is ‘mining’. You’ve promised to get the value out. Whether you destroy your tooling doing it, or exhaust your miners, or break laws is not (formally) the concern of the patron. You’ve committed to deliver the value, and how to do it is your problem.

So Legal Contracts Are the Problem?

Bear in mind that the fact of whether a formal signed contract exists is not the sole determinant of whether a commitment made prevents you from being Agile. Many organisations have contracts written, but the nature of their relationship is such that it is only there as an insurance if the already existing trust between completely breaks down. Commitments can be renegotiated because both sides trust each other, and both sides find a way to satisfy each others needs and demands.

As the old saying goes: “if you need to get the contract off the shelf, you’ve both already lost”. Once trust goes, everything falls apart.

By the same token, internal patrons within businesses may commission projects or efforts without a formal contract, but with an implicit contract that signals low trust or flexibility.

Fundamentally, the question should not be ‘is there a contract?’, rather ‘is there a binding and inflexible commitment?’

If a formal contract exists that defines cost, scope, or time, there is always a danger that the patron believes such a commitment has been made. If no formal contract exists but money is changing hands, then it’s simpler: the duty of the producer is to keep the client happy enough that they keep paying, but how they do that is entirely up to them.

Trust, Commitment, and Contracts

When you hear exhortations to be Agile at your work, think about the aspects of trust, commitment, and contracts, and how they relate to your work:

  • Has a commitment been made (or does the patron believe a commitment has been made)?
  • Does the patron trust the delivery team to deliver in their best interests?
  • Is there a contract ‘behind’ the work?

Considering these questions can determine whether Agility is a realistic approach for your working context.

The corollary of this is that if Agility is what you want, you may need to work on these fundamentals first if you really want things to change. Otherwise you will be swimming against the tide.


If you enjoyed this, then please consider buying me a coffee to encourage me to do more.


If you like this, you might like one of my books:
Learn Bash the Hard Way

Learn Git the Hard Way
Learn Terraform the Hard Way

LearnGitBashandTerraformtheHardWay
Buy in a bundle here

Five Ansible Techniques I Wish I’d Known Earlier

If you’ve ever spent ages waiting for an Ansible playbook to get through a bunch of tasks so yours can be tested, then this article is for you.

Ansible can be pretty tedious to debug and obscure to develop at times (“What’s the array I need to access the IP address on the en2 interface again?”), so I went looking for various ways to speed up the process, and make it easier to figure out what is going on.

Eventually I found five tools or techniques that can help, so here they are.

These tips go in order from easiest to hardest to implement/use.

1) --step

This is the simplest of the techniques to implement and follow. Just add --step to your ansible-playbook command, and for each task you run you will get a prompt that looks like this:

PLAY [Your play name] ****************************************************************************************
Perform task: TASK: Your task name (N)o/(y)es/(c)ontinue:

For each task, you can choose to run the task (yes), not run the task (no, the default), or run the rest of the play (continue).

Note that continue will run until the end of the play, not the end of the entire run. Quite handy if you know you want to say yes to everything in the current playbook.

The downside is that if there are many tasks to get through, you have to be careful not to keep your finger on the return key and accidentally go to far.

It would be a nice little open source project for someone to make this feature more powerful, adding ‘back’ and ‘skip this playbook’ features.

2) Inline logging

In addition to runtime control, you can use old-fashioned log lines to help determine what’s going on. The following snippet of code will ‘nicely’ dump out json representations of the variables set across all the hosts. This is really handy if you want to know where Ansible has some information you want to reference in your scripts.

- name: dump all
  hosts: all
  tasks:
    - name: Print some debug information
      vars:
        msg: |
          Module Variables ("vars"):
          --------------------------------
          {{ vars | to_nice_json }}
          ================================

          Environment Variables ("environment"):
          --------------------------------
          {{ environment | to_nice_json }}
          ================================

          Group Variables ("groups"):
          --------------------------------
          {{ groups | to_nice_json }}
          ================================

          Host Variables ("hostvars"):
          --------------------------------
          {{ hostvars | to_nice_json }}
          ================================
      debug:
        msg: "{{ msg.split('\n') }}"
      tags: debug_info

As you’ll see later, you can also interrogate the Python environment interactively…

3) Run ansible-lint

As with most linters, ansible-lint can be a great way to spot problems and anti-patterns in your code.

Its output includes lines like this:

roles/rolename/tasks/main.yml:8: risky-file-permissions File permissions unset or incorrect

You configure it with a .ansible-lint file, where you can suppress classes of error, or just tell you to warn.

The list of rules are available here, and more documentation is available here.


If you like this, you might like one of my books:
Learn Bash the Hard Way

Learn Git the Hard Way
Learn Terraform the Hard Way

LearnGitBashandTerraformtheHardWay
Buy in a bundle here

4) Run ansible-console

This can be a huge timesaver when developing your Ansible code, but unfortunately there isn’t much information or guidance out there on how to use it, so I’m going to go into a bit more depth here.

The simplest way to run it is just as you would a playbook, but with console instead of playbook:

$ ansible-console -i hosts.yml
Welcome to the ansible console.
Type help or ? to list commands.
imiell@all (1)[f:5]$

You are greeted with a prompt and some advice. If you type help, you get a list of all the commands and modules available to you to use in the context in which you have run ansible-console:

Documented commands (type help <topic>):
========================================
EOF             dpkg_selections  include_vars   setup
add_host        exit             iptables       shell
apt             expect           known_hosts    slurp
apt_key         fail             lineinfile     stat
apt_repository  fetch            list           subversion
assemble        file             meta           systemd
assert          find             package        sysvinit
async_status    forks            package_facts  tempfile
async_wrapper   gather_facts     pause          template
become          get_url          ping           timeout
become_method   getent           pip            unarchive
become_user     git              raw            uri
blockinfile     group            reboot         user
cd              group_by         remote_user    validate_argument_spec
check           help             replace        verbosity
command         hostname         rpm_key        wait_for
copy            import_playbook  script         wait_for_connection
cron            import_role      serial         yum
debconf         import_tasks     service        yum_repository
debug           include          service_facts
diff            include_role     set_fact
dnf             include_tasks    set_stats

You can ask for help on these. If it’s a built-in command, you get a brief description, eg:

imiell@all (1)[f:5]$ help become_user
Given a username, set the user that plays are run by when using become

or, if it’s a module, you get a very handy overview of the module and its parameters:

imiell@all (1)[f:5]$ help shell
Execute shell commands on targets
Parameters:
  creates A filename, when it already exists, this step will B(not) be run.
  executable Change the shell used to execute the command.
  chdir Change into this directory before running the command.
  cmd The command to run followed by optional arguments.
  removes A filename, when it does not exist, this step will B(not) be run.
  warn Whether to enable task warnings.
  free_form The shell module takes a free form command to run, as a string.
  stdin_add_newline Whether to append a newline to stdin data.
  stdin Set the stdin of the command directly to the specified value.

Where the console comes into its own is when you want to experiment with modules quickly. For example:

imiell@basquiat (1)[f:5]$ shell touch /tmp/asd creates=/tmp/asd
basquiat | CHANGED | rc=0 >>

imiell@basquiat (1)[f:5]$ shell touch /tmp/asd creates=/tmp/asd
basquiat | SUCCESS | rc=0 >>
skipped, since /tmp/asd exists

If you have multiple hosts, it will run across all those hosts. This is a great way to broadcast commands across a wide range of hosts.

If you want to work on specific hosts, then use the cd command, which (misleadingly) changes your host context rather than directory. You can choose a specific host, or a group of hosts. By default, it uses all:

imiell@all (4)[f:5]$ cd basquiat
imiell@basquiat (1)[f:5]$ command hostname
basquiat | CHANGED | rc=0 >>
basquiat

If a command doesn’t match an Ansible command or module, it assumes it’s a normal shell command and runs it through one of the Ansible shell modules:

imiell@basquiat (1)[f:5]$ echo blah
basquiat | CHANGED | rc=0 >>
blah

The console has autocomplete, which can be really handy when you’re playing around:

imiell@basquiat (1)[f:5]$ expect <TAB><TAB>
chdir=      command=    creates=    echo=       removes=    responses=  timeout=
imiell@basquiat (1)[f:5]$ expect

5) The Ansible Debugger

Ansible also contains a debugger that you can use to interrogate a running Ansible process. In this example, create a file called playbook.yml, add this play to an existing one, or modify an existing play:

- hosts: all
  debugger: on_failed
  gather_facts: no
  tasks:
    - fail:

$ ansible-playbook playbook.yml
PLAY [all] ***************************
TASK [fail] **************************
Friday 27 August 2021  12:16:24 +0100 (0:00:00.282)       0:00:00.282 *********
fatal: [Ians-Air.home]: FAILED! => {"changed": false, "msg": "Failed as requested from task"}
[Ians-Air.home] help
EOF  c  continue  h  help  p  pprint  q  quit  r  redo  u  update_task

From there, you can execute Python commands directly to examing the context:

[Ians-Air.home] TASK: wrong variable (debug)> dir()
['host', 'play_context', 'result', 'task', 'task_vars']

Or use the provided commands to help you debug. For example, p maps to a pretty-print command:

[Ians-Air.home] TASK: wrong variable (debug)> p dir(task)
['DEPRECATED_ATTRIBUTES',
 '__class__',
 '__delattr__',
 '__dict__',
 '__doc__',
[...]
 'tags',
 'throttle',
 'untagged',
 'until',
 'validate',
 'vars',
 'when']

If you like this, you might like one of my books:
Learn Bash the Hard Way

Learn Git the Hard Way
Learn Terraform the Hard Way

LearnGitBashandTerraformtheHardWay
Buy in a bundle here

If you enjoyed this, then please consider buying me a coffee to encourage me to do more.

A ‘Hello World’ GitOps Example Walkthrough

Intro

This post walks through a ‘hello world’ GitOps example I use to demonstrate key GitOps principles.

If you’re not aware, GitOps is a term coined in 2017 to encapsulate certain engineering principles that were becoming more common with the advent of recent tooling in the area of software deployment and maintenace.

If you want to know more about the background and significance of GitOps, I wrote an ebook on the subject, available for download here from my company. One of the more fun bits of writing that book was creating this diagram, which seeks to show the historical antecedents to the latest GItOps tooling, divided on the three principles of declarative code, source control, and distributed control loop systems.

This post is more a detailed breakdown of one implementation of a trivial application. It uses the following technologies and tools:

  • Docker
  • Kubernetes
  • GitHub
  • GitHub Actions
  • Shell
  • Google Kubernetes Engine
  • Terraform

Overview

The example consists of four repositories:

It should be viewed in conjunction with this diagram to get an overview of what’s going on in the example. I’ll be referring to the steps from 0 to 5 in some detail below:

An overview of the flow of this

There are three ‘actors’ in this example: a developer (Dev), an operations engineer (Ops), and an Infrastructure engineer (Infra). The Dev is responsible for the application code, the Ops is responsible for deployment, and the Infra is responsible for the platform on which the deployment runs.

The repository structure reflects this separation of concerns. In reality, all roles could be fulfilled by the same person, or there could be even more separation of duties.

Also, the code need not be separated in this way. In theory, just one repository could be used for all four purposes. I discuss these kind of ‘GitOps Decisions’ in my linked post.


If you like this, you might like one of my books:
Learn Bash the Hard Way

Learn Git the Hard Way
Learn Terraform the Hard Way

LearnGitBashandTerraformtheHardWay
Buy in a bundle here

The Steps

Here’s an overview of the steps outlined below:

  • A – Pre-Requisites
  • B – Fork The Repositories
  • C – Create The Infrastructure
  • D – Set Up Secrets And Keys
    • D1 – Docker Registry Login Secret Setup
    • D2 – Set Up Repository Access Token
    • D3 – Install And Set Up FluxCD
  • E – Build And Run Your Application

A – Pre-Requisites

You will need:

B – Fork the Repositories

Fork these three repositories to your own GitHub account:

C – Create the Infrastructure

This step uses the infra repository to create a Kubernetes cluster on which your workload will run, with its configuration being stored in code.

This repository contains nothing in the main branch, and a choice of branches depending on the cloud provider you want to choose.

The best-tested branch is the Google Cloud Provider (gcp) branch, which we cover here.

The code itself consists of four terraform files:

  • connections.tf
    • defines the connection to GCP
  • kubernetes.tf
    • defines the configuration of a Kubernetes cluster
  • output.tf
    • defines the output of the terraform module
  • vars.tf
    • variable definitions for the module

To set this up for your own purposes:

  • Check out the gcp branch of your fork of the code
  • Set up a Google Cloud account and project
  • Log into Google Cloud on the command line:
    • gcloud auth login
    • Update components in case they have updated since gcloud install:
      • gcloud components update
  • Set the project name
    • gcloud config set project <GCP PROJECT NAME>
  • Enable the GCP container APIs
    • gcloud services enable container.googleapis.com
  • Add a terraform.tfvars file that sets the following items:
    • cluster_name
      • Name you give your cluster
    • linux_admin_password
      • Password for the hosts in your cluster
    • gcp_project_name
      • The ID of your Google Cloud project
    • gcp_project_region
      • The region in which the cluster should be located, default is us-west-1
    • node_locations
      • Zones in which nodes should be placed, default is ["us-west1-b","us-west1-c"]
    • cluster_cp_location
      • Zone for control plane, default is us-west1-a
  • Run terraform init
  • Run terraform plan
  • Run terraform apply
  • Get kubectl credentials from Google, eg:
    • gcloud container clusters get-credentials <CLUSTER NAME> --zone <CLUSTER CP LOCATION>
  • Check you have access by running kubectl cluster-info
  • Create the gitops-example namespace
    • kubectl create namespace gitops-example

If all goes to plan, you have set up a kubernetes cluster on which you can run your workload, and you are ready to install FluxCD. But before you do that, you need to set up the secrets required across the repositories to make all the repos and deployments work together.

D – Set Up Secrets And Keys

In order to co-ordinate the various steps in the GitOps workflow, you have to set up three sets of secrets in the GitHub repositories. This is to allow:

  1. The Kubernetes cluster to log into the Docker repository you want to pull your image from
  2. The github-example-app‘s repository action to update the image identifier in the github-example-deploy repository
  3. Allow fluxcd to access the gitops-example-deploy GitHub repository from the Kubernetes cluster

D1. Docker Registry Login Secret Setup

To do this you create two secrets in the gitops-example-app repository at the link:

https://github.com/<YOUR GITHUB USERNAME>/gitops-example-app/settings/secrets/actions

  • DOCKER_USER
    • Contains your Docker registry username
  • DOCKER_PASSWORD
    • Contains your Docker registry password

Next, you set up your Kubernetes cluster so it has these credentials.

  • Run this command, replacing the variables with your values:

kubectl create -n gitops-example secret docker-registry regcred --docker-server=docker.io --docker-username=$DOCKER_USER --docker-password=$DOCKER_PASSWORD --docker-email=$DOCKER_EMAIL

D2. Set up Repository Access Token

To do this you first create a personal access token in GitHub.

  • You can do this by visiting this link. Once there, generate a token called EXAMPLE_GITOPS_DEPLOY_TRIGGER.
  • Give the token all rights on repo, so it can read and write to private repositories.
  • Copy that token value into a secret with the same name (EXAMPLE_GITOPS_DEPLOY_TRIGGER) in your gitops-example-app at:

https://github.com/<YOUR GITHUB USERNAME>/gitops-example-app/settings/secrets/actions

D3 – Install And Set Up FluxCD

Finally, you set up flux in your Kubernetes cluster, so it can read and write back to the gitops-example-deploy repository.

  • The most up-to-date FluxCD deployment instructions can be found here, but this is what I run on GCP to set up FluxCD on my cluster:

kubectl create clusterrolebinding "cluster-admin-$(whoami)" --clusterrole=cluster-admin --user="$(gcloud config get-value core/account)"
kubectl create ns flux
fluxctl install --git-branch main --git-user=<YOUR GITHUB USERNAME> --git-email=<YOUR GITHUB EMAIL> --git-url=git@github.com:<YOUR GITHUB USERNAME>/gitops-example-deploy --git-path=namespaces,workloads --namespace=flux | kubectl apply -f -

  • When the installation is complete, this command will return a key generated by FluxCD on the cluster:

fluxctl identity --k8s-fwd-ns flux

  • You need to take this key, and place it in the gitops-example-deploy repository at this link:

https://github.com/<YOUR GITHUB USERNAME>/gitops-example-deploy/settings/keys/new

  • Call the key flux
  • Tick the ‘write access’ option
  • Click ‘Add Key’

You have now set up all the secrets that need setting up to make the flow work.

You will now make a change and will follow the links in the steps as the application builds and deploys without intervention from you.

E – Build And Run Your Application

To deploy your application, all you need to do is make a change to the application in your gitops-example-app repository.

An overview of the flow of this
  • Step 1a, 2 and 3

Go to:

https://github.com/<YOUR GITHUB USERNAME>/gitops-example-app/blob/main/Dockerfile

and edit the file, changing the contents of the echo command to whatever you like, and commit the change, pushing to the repository.

This push (step 1a above) triggers the Docker login, build and push via a GitHub action (steps 2 and 3), which are specified in code here:

https://github.com/ianmiell/gitops-example-app/blob/main/.github/workflows/main.yaml#L13-L24

This action uses a couple of docker actions (docker/login-action and docker/push-action) to commit and push the new image with a tag of the github SHA value of the commit. The SHA value is given to you as a variable by GitHub Actions (github.sha) within the action’s run. You also use the DOCKER secrets set up earlier. Here’s a snippet:

    - name: Log in to Docker Hub
      uses: docker/login-action@f054a8b539a109f9f41c372932f1ae047eff08c9
      with:
        username: ${{secrets.DOCKER_USER}}
        password: ${{secrets.DOCKER_PASSWORD}}
    - name: Build and push Docker image
      id: docker_build
      uses: docker/build-push-action@ad44023a93711e3deb337508980b4b5e9bcdc5dc
      with:
        context: .
        push: true
        tags: ${{secrets.DOCKER_USER}}/gitops-example-app:${{ github.sha }}
  • Step 4

Once the image is pushed to the Docker repository, another action is called which triggers another action that updates the gitops-example-deploy Git repository (step 4 above)

    - name: Repository Dispatch
      uses: peter-evans/repository-dispatch@v1
      with:
        token: ${{ secrets.EXAMPLE_GITOPS_DEPLOY_TRIGGER }}
        repository: <YOUR GITHUB USERNAME>/gitops-example-deploy
        event-type: gitops-example-app-trigger
        client-payload: '{"ref": "${{ github.ref }}", "sha": "${{ github.sha }}"}'

It uses the Personal Access Token secrets EXAMPLE_GITOPS_DEPLOY_TRIGGER created earlier to give the action the rights to update the repository specified. It also passes in an event-type value (gitops-example-app-trigger) so that the action on the other repository knows what to do. Finally, it passes in a client-payload, which contains two variables: the github.ref and the github.sha variables made available to us by the GitHub Action.

This configuration passes all the information needed by the action specified in the gitops-example-deploy repository to update its deployment configuration.

The other side of step 4 is the ‘receiving’ GitHub Action code here:

https://github.com/ianmiell/gitops-example-deploy/blob/main/.github/workflows/main.yaml

Among the first lines are these:

on:
  repository_dispatch:
    types: gitops-example-app-trigger

Which tell the action that it should be run only on a repository dispatch, when the event type is called gitops-example-app-trigger. Since this is what we did on the push to the gitops-example-app action above, this should be the action that’s triggered on this gitops-example-deploy repository.

The first thing this action does is check out and update the code:

      - name: Check Out The Repository
        uses: actions/checkout@v2
      - name: Update Version In Checked-Out Code
        if: ${{ github.event.client_payload.sha }}
        run: |
          sed -i "s@\(.*image:\).*@\1 docker.io/${{secrets.DOCKER_USER}}/gitops-example-app:${{ github.event.client_payload.sha }}@" ${GITHUB_WORKSPACE}/workloads/webserver.yaml

If a sha value was passed in with the client payload part of the github event, then a sed is performed, which updates the deployment code. The workloads/webserver.yaml Kubernetes specification code is updated by the sed command to reflect the new tag of the Docker image we built and pushed.

Once the code has been updated within the action, you commit and push using the stefanzweifel/git-auto-commit-action action:

      - name: Commit The New Image Reference
        uses: stefanzweifel/git-auto-commit-action@v4
        if: ${{ github.event.client_payload.sha }}
        with:
          commit_message: Deploy new image ${{ github.event.client_payload.sha }}
          branch: main
          commit_options: '--no-verify --signoff'
          repository: .
          commit_user_name: Example GitOps Bot
          commit_user_email: <AN EMAIL ADDRESS FOR THE COMMIT>
          commit_author: <A NAME FOR THE COMMITTER> << AN EMAIL ADDRESS FOR THE COMMITTER>>
  • Step 5

Now the deployment configuration has been updated, we now wait for FluxCD to notice the change in the Kubernetes deployment configuration. After a few minutes, the Flux controller will notice that the main branch of the gitops-example-deploy repository has changed, and try the apply the yaml configuration in that repository to the Kubernetes cluster. This will update the workload.

If you port-forward to the application’s service, and hit it using curl or your browser, you should see that the application’s output has changed to whatever you committed above.

And then you’re done! You’ve created an end-to-end GitOps continuous delivery pipeline from code to cluster that requires no intervention other than a code change!

Cleanup

Don’t forget to terraform destroy your cluster, to avoid incurring a large bill with your cloud provider!

Lessons Learned

Even though this is as simple an example I could make, you can see that it involves quite a bit of configuration and setup. If anything went wrong in the pipeline, you’d need to understand quite a lot to be able to debug it and fix it.

In addition, there are numerous design decisions you need to make to get your GitOps workflow working for you. Some of these are covered in my previous GitOps Decisions post.

Anyone who’s coded anything will know that there is this tax on the benefits of automation. It should not be underestimated by any team looking to take up this deployment methodology.

On the other hand, once the mental model of the workflow is internalised by a team, significant savings and improvements to delivery are seen.


If you like this, you might like one of my books:
Learn Bash the Hard Way

Learn Git the Hard Way
Learn Terraform the Hard Way

LearnGitBashandTerraformtheHardWay
Buy in a bundle here

If you enjoyed this, then please consider buying me a coffee to encourage me to do more.


If You Want To Transform IT, Start With Finance

tl;dr – ‘Money Flows Rule Everything Around Me’

When talking about IT transformation, we often we often talk about ‘culture’ being the problem in making change, but why stop there?

If we take a ‘5 whys‘ approach, then we should go deeper. So, where can we go from ‘culture’?

Here I suggest we should consider a deeper structural cause of cultural problems in change management: how money flows through the organisation.

If you want to change an organisation, you need to look at how money works within it.

I talked a little about this in a recent podcast.

Picking Up Two Old Threads

In this post I want to pick up here on two threads that have cropped up in previous posts and bring them together.

  1. ‘Start with Finance’

An aside I made in this somewhat controversial previous post:

Command and control financial governance structures just aren’t changing overnight to suit an agile provisioning model. (As an aside, if you want to transform IT in an enterprise, start with finance. If you can crack that, you’ve a chance to succeed with sec and controls functions. If you don’t know why it’s important to start with finance, you’ll definitely fail).

zwischenzugs.com/

was picked up on by many people. They wanted to know more about why I was so vehement on that. Unfortunately, it was a throwaway line, and there was too much to unpack and it wasn’t clearly formed in my mind. But like many throwaway lines, it revealed a thread that might be good to pull on.

2. ‘Culture – Be Specific!’

Previously I was triggered by Charity Majors (@mipsytipsy) to write about my frustration at IT’s apparent inability to probe deeper than ‘culture’ when trying to diagnose problems in technical businesses.

Since then, I’ve spent more time in the course of my work trying to figure out what’s blocking companies from trying to change and increasingly have worked back from people and process to sales and funding.

The Argument

The argument breaks down like this:

  • To achieve anything significant you need funding
  • To get funding you need to persuade the people with the money to part with it
  • To persuade the people with the money, you need to understand what they value
  • To understand what they value, you need to understand how their cash flows work
  • To understand how their cash flow works, you need to understand
    • your customers/clients and how and why they part with their money
    • the legal and regulatory constraints on your business and how it operates

Or, put more engagingly:

Any significant decision or change therefore gets made in the context and constraints of how and why money is distributed to, and within, the business.

In addition to this systemic level, there is also a more visceral personal level on which money flows can change or affect behaviour. Compensation, threats of firing, and bonuses can all drive good or bad behaviours. Or, as it’s been put pithily before:

When you’ve got them by their wallets, their hearts and minds will follow.

Fern Naito

This is not to say that all culture is 100% determined by money flows. Individuals can make a difference, and go against the tide. But in the end, the tide is hard to fight.


There is a precedent for this argument in philosophy. Karl Marx argued
that societal culture (the ‘superstructure’) was ultimately determined
by material relations of production (the ‘base’). From wikipedia:

The base comprises the forces and relations of production (e.g. employer–employee work conditions, the technical division of labour, and property relations) into which people enter to produce the necessities and amenities of life. The base determines society’s other relationships and ideas to comprise its superstructure, including its cultureinstitutions, political power structuresrolesrituals, and state. The relation of the two parts is not strictly unidirectional, Marx and Engels warned against such economic determinism as the superstructure can affect the base. However the influence of the base is predominant.[1]

Wikipedia, Base and Superstructure
You have nothing to lose but your blockchains.

What Does This Mean For IT?

The theory is all very interesting, but what does this mean in practice?

There is a very common pattern in software companies’ histories (especially if they were founded before the Software-as-a-Service age), and understanding their flows in terms of their histories can explain a lot about how and why they struggle to change. I have seen it multiple times in the course of my work, both as a consultant and as an employee.

The Four Stages

Stage I – Hero Hacking

When a software company starts up, it often builds a product for a few big customers that sustain their cash flow in the early days. These times are a natural fit for ‘hero hackers’ who build features and fix bugs on live systems all night to help get that contract signed and keep their customers happy.

Your few customers are all important and demand features peculiar to them, so you keep delivery times low by having customer-specific code, or even forking the entire product codebase to keep up.

Stage I – customer asks, customer gets
Stage II – Pseudo Product

Now that you have some customers and they are happy with the product, its features, and your staff’s dedication to keeping them happy, more customers come along. So you sign contracts with them, and before you know it you have numerous customers.

Of course, you’re selling your services as a product, but the reality is that it’s a mess. Each installation is more or less unique, and requires individual teams to maintain or develop on them.

Stage II – Customer pays, customer gets… eventually. Things have gotten more complicated.

This is where things start to get more complicated.

  • Features grow and diverge for difference customers
  • Features get built in parallel for different customers, sometimes similar, but not the same
  • Database schemas diverge
  • Porting features sounds trivial (it’s a product, right?) but gets messy as code gets passed around different codebases
  • Some attempts are made to centralise or share core functionality, but this can slow down delivery or just get too complicated for teams to maintain

Grumbles from customers and between development teams start to increase in volume.

Stages IIIa and IIIb

The last two stages are closely related. Either or both can happen in the same company. Stage IIIb doesn’t necessarily follow from Stage IIIa, it’s really just the same problem in another form for the SaaS age.

Stage IIIa – We Want To Be A Product Company

As you get more and more customers it makes less and less sense to have these different teams porting features from one codebase to another, or copying and pasting for individual projects. Customers start to complain that their system is expensive to build on and maintain, as feature x requires ‘integration’ or some kind of bespoke delivery cost for their deployment.

At this point someone says: ‘Wouldn’t it make more sense for us to maintain one product, and maintain that centrally for multiple customers? That way, we can sell the same thing over and over again, increase the license cost, reduce delivery cost and make more profit.’

Stage III is where the cracks really start to show, and we go into how and why this happens this below.

The product vision – more customers pays less, and product improves
Stage IIIb – We Need An Internal Platform

As the (pseudo or real) product offering grows, or as you increasingly offer your software as a service on the cloud rather than a package delivered in a data centre, you invest heavily in a ‘platform’ that is designed to enable deliveries to be faster, cheaper, and better.

You might even set up some kind of platform team to build these cross-product features. It’s a similar justification to the product one: ‘Wouldn’t it make more sense for us to maintain one platform, and use it to deliver products for multiple customers? That way we could reduce cost of delivery for all the customers that use the platform, and increase quality at the same time.’

Where Does It All Go Wrong?

So how do things go wrong?

From Stage I to Stage II, things are relatively smooth. Everyone understands their role, and the difficulties you face are tough, but tractable and clear. As you go to Stage IIIa/b, it feels very tough to move towards the envisioned target. Everyone seems to agree what the goal is, but the reality is:

  • Customers still want their new features fast (and faster than their competition), and don’t want their requests to be ‘put on the backlog’
  • The merging of the codebases seems never to happen
  • Attempts to write new, unifying products are difficult to build and sell

All of these difficulties and failures can often be tracked to money flows.

Similarly, with platform teams:

  • The business wants to build a platform, but balks at the cost and struggles to see the value
  • The business has built the platform, but doesn’t accept that it needs a team to sustain it
  • The business has built a platform for reliability, but ‘heroes’ still want to fix things by hand for the glory rather than quietly tinker with a CI/CD workflow

Again, all of these difficulties and failures can often be tracked to money flows.

How This Happens – Money Flow Patterns

These difficulties come down to challenges moving from project to product, and these difficulties in turn come from how money moves into and through the business.

Stage I Money Flows – Hero Hacking

In Stage I, the money flows are simple:

  • Team builds software in cheap offices, often on low salaries with the promise of growth to come or fun, adventure and really wild things
  • The first customers are won because the product does something cheaper or better than the competition
  • The money from the first customers pays for nicer offices and more teams
  • More money comes in as customers demand modifications or maintenance on the delivery

The reality at Stage I is that there is no real ‘product’. There are projects that deliver what the customer needs, and the business is happy to provide these as each individual project is profitable (either on a ‘time and materials’ or a ‘fixed cost’ basis), and that results in a healthy profit at the end of the year.

The price that’s paid is that each customer’s codebase and configuration diverges, making porting those features and maintenance patterns a little more costly as time goes on.

But no matter: the business has a simple model for cash flow: keep the customer happy and the money flows in, and each customer gets the development and maintenance attention they pay for.

Stage I – customer asks, customer gets

Stage II Money Flows – Pseudo Product

In Stage II, the money flows are much the same, but the cracks are starting to show:

  • Customers are still happy with the attention they get, but:
    • Projects seem to take longer and cost more
    • Features that are apparently part of the product can’t be just ‘switched on’, but require ‘integration time’
    • The quality of the software feels low, as fixes are required because of the extra work required to integrate changes
Stage II – Customer pays, customer gets… eventually. Things have gotten more complicated.

At this point, customer executives start to say they yearn for a product that has a more predictable quality to it, and a roadmap, and is cheaper and feels less bespoke. Can’t all us customers just pay a lower fee every year and get a steadily improving product?

At the same time, the owners of the business are asking themselves whether they couldn’t make more money the same way: instead of 15 customers, wouldn’t it be great if we have 150, all taking the same product and paying (say) half the cost they are now? That kind of margin looks very tempting…

The result is that changes in external demand produce a desire to change the development model.

Stage IIIa – We Want To Be A Product Company

In Stage IIIa (and Stage IIIb), if the money flows stay the same as in Stages I and II, move to becoming a product company will feel extremely difficult. This is felt in a number of ways. Here’s one common story:

  • The company sets up a ‘product team’ that is responsible for productising the various disparate codebases and hacks that made up each customer’s bespoke setup.
  • This product team tries to corral the project teams into sacrificing short-term customer delight for long-term product strength and consistency.
  • The product team spends a lot of money doing all the work that is required to make a product, but customers are proving less willing than they said to believe and buy into the product. They find it difficult to change their priorities from the feature delivery times and speed of support they are used to, to accepting delays for a cheaper productised product.

Productisation Debt

Time and again, development and product teams tell their management that they have to make a call: sacrifice customer satisfaction for the product, or build up ‘productisation debt’.

  • Do you tell your biggest customer they are going to have to wait another month for a feature because the product has a release cadence and standards that are greater than they are willing to accept?
  • Even if they have the money ready to get the service they want?
  • Are you willing to watch the relationship with that customer deteriorate over several very difficult meetings as you explain to them that they can’t have what they want when they want it anymore?
  • Are you willing to risk losing them completely?
  • Do you tell them that they suffered an outage because of a change made for another customer last release?
    • Will it be any comfort to them to know that this feature is now available to them (fully fixed in the next release)?
The product vision – more customers pays less, and product improves

The result is that it takes a much longer time and more money than thought to get a successful product model going when the older money flows continue to push the organisation towards its old habits and culture. Why trade the feel-good factor of giving the customer what they want now for the slow burn of deferred rising profits in the future?

On the face of it it the arguments look simple: your profit margin will go up if you productise. The reality is that finance (and therefore the executives, shareholders, salespeople, HR, reward systems etc) have gotten used to the project-based money flows and cadences and find it incredibly hard to give up for some uncertain but better future that may be years away.

What you end up with is a more complicated version of Stage II (simplified here with only two customers for ‘clarity’).

The Product reality – customers and finance want to keep the relationship the same

Rather than your customer teams fighting with the customer to give them what they want, you now have more forces acting in opposition within your org, including:

  • The product team fights with the customer teams for resources
  • The customer team fights with the product team over productisation calls
  • Finance fights with the product development team for resources

The result is likely to end in failure for your product strategy.

Stage IIIb – We Need A Platform

The ‘platform’ stage is really a variation on the product phase (Stage IIIa), except that this time the customers have no visibility of the productisation or automation of what they’re sold. This is effectively a product built for an internal customer, the finance team who hope for money to be saved per project over time after an initial investment.

Platform team money flows similar

This can be easier or harder to achieve than Stage IIIa depending on the attitude of the internal customer vs the external customer.

Again, this can be affected by the details of the money flows: if the work to build a platform is on the books as capital expenditure (as opposed to operational expenditure – see below), executives may well ask ‘is the platform built yet?’ This question can baffle the team, as they’re well aware that such a platform is never ‘finished’, as there are always efficiency-generating improvements to make.

In both Stage IIIs, if the benefits of the platform are not quantified in financial terms from the start, then getting the funding required becomes difficult. This means that you should:

  • Measure the cost of delivery per project pre-platform, so you can compare to post platform
  • Ensure that the cost of the platform is ‘baked in’ to the sales cycle, so that there is a concept of platform profit and loss that can also be measured
  • Set expectations that ‘profit’ may be a long time coming, as is the case with most capital investments. Would you expect to build a house and start turning a profit in 1/20th of its lifetime?

Money Flow Patterns

The above patterns are common to small-medium sized software B2B software businesses, but they are not the only patterns that drive cultures and behaviour inappropriate to their stated aims.

Here we list some significant patterns and their effects on culture, delivery and operations.

Opex vs capex

Opex (operational expenditure) and capex (capital expenditure) are two different ways that business spending can be categorised. Briefly, opex is regular expenditure, and capex is significant, long-term expenditure.

Software projects’ costs have traditionally been categorised under capex, but as cloud computing has arisen, more and more of their costs have been moved to opex.

The designation of the spending can make a significant difference to how the work is treated by the business.

  • They may have different approval processes
  • There might be more money in the ‘capex pot’ this year than the ‘opex pot’ (or vice versa)
  • Your business may mandate that opex costs are preferred over capex costs because they see the management of assets as a burden
  • Or (as seen above) if the building of a software platform is considered a capex, then it might be considered as ‘done’ rather than as something that needs to be maintained as an opex

There are tax implications to both capex and opex that can further complicate discussions.

The line between what a capex and opex purchase is is not always clear, and most projects will have some kind of mixture of the two that make working out the effect on the business’s profit and loss account for that year difficult.

Project-based funding

Project-based funding is where money is allocated to a specific project and/or outcomes, as opposed to product-based work, where funding is usually allocated continuously. Project funding may be on a ‘time and materials’ or ‘fixed cost’ basis.

The cultural patterns associated with project-based funding are:

  • Pride in customer service and satisfaction
  • Prioritisation given to delivery over long-term stability
  • Scant attention paid to maintenance costs
  • Mounting technical debt and increasing complexity over time
  • Lack of co-ordination / duplication of effort between project teams
  • A ‘hero’ culture, as fast fixes to problems that arise gain attention over slower improvements
  • Perceived higher value for customer-pleasing project work over central and internal infrastructure work

Yearly funding cycles / business unit funding

Yearly funding cycles are where money is allocated to projects or products at the same time every year. This is normally driven by accounting cycles, which are almost always yearly.

Yearly accounting cycles make a mockery of technical teams’ attempts to be truly ‘Agile’ in response to business demand. If a released MVP product flies off the shelf in February, then you can’t get funding to scale it up until January next year.

Strict yearly funding cycles are also often associated with centralised funding within large business units that sit within larger organisations. This can make working in an agile way even harder, as there are two levels of politics to negotiate before more funding can be gained for your team: your own business unit’s internal politics, and the business unit’s relationship with the central funders.

First mover bears the cost

Individual business unit funding also makes it significantly harder for any kind of project whose benefits cut across business units to get off the ground, eg ‘Platform’ or ‘Infrastructure’ work. Costs for such projects are typically borne by a single centralised business unit that is perceived as delivering little business value, so is starved of funding.

This can also be characterised as a ‘first mover bears the cost’ problem.

No money for hard-to-measure benefits

Some organisations take a strict view of cost/benefit for any given expenditure that they make that must show some direct, tangible return on investment.

In general, this is sensible, but can result in difficulty getting funding for projects where there is not a readily measurable return.

For example, what if your investment:

  • Helps retain staff
  • Enables more dynamic and faster business outcomes
  • Reduces the risk of a failed audit

Is there a way to measure these, or even the language to state them as benefits at all?

Many businesses have no leeway for these qualitative benefits to factor into business cases.

What Is To Be Done?

Whenever you want to debug a misbehaving program, you want to get to the root cause. Workarounds are unsatisfying, and ultimately result in further problems as the workaround itself shows its failings.

I feel the same way about ‘cultural problems’ in organisations. It’s not enough to put posters up around and office imploring people to ‘be more agile’, or instruct people to have a daily stand-up and a two week work cadence to drive cultural change.

No, you have to go to the root of the structures that drive behaviour in order to make lasting change, whether it’s on a personal level or organisational. And my argument here is that the root of behaviours can be traced back to money flows.

So, what can you do about it? Here’s some suggestions:

  • Involve the CFO/finance team in the change program from the start
  • Explain to finance the reality of what you’re doing
  • Learn to speak the language of finance – talk to them

Most important of all, if you’re going to change the behaviour and goals of an organisation, you are going to have to change the way money moves around it. As Einstein is [wrongly said to have] said, doing the same thing over and over and expecting different is the definition of insanity.

If you can engage with finance in an open and enquiring way, then together you can make lasting change; if you don’t, then you will be fighting the tide. Just ask Marx.


If you enjoyed this, then please consider buying me a coffee to encourage me to do more.

How To Waste Hundreds of Millions on Your IT Transformation

You’re a few years into your tenure as CEO of a Vandelay Industries, a behemoth in the Transpondsting space that’s existed for many decades.

The Real Strategy

You could really use the share price to go up soon so you can and sell your shares at the optimal point before fortune’s wheel turns and the board inevitably get rid of you.

You’re tired of Vandelay, and want to move to a better CEO job or maybe a nice juicy Chairmanship of another behemoth board before the share price drops.

What happens to Vandelay after you’re gone is not your problem. In fact, as you are likely to go and work for a rival corporation, it might even be better if things were worse at Vandelay once you’ve sold your shares than better.

Any means necessary are on the table.

Fortunately the solution to this problem is simple:

Declare a major technology transformation!

Why? Wall Street will love it. They love macho ‘transformations’. By sheer executive fiat Things Will Change, for sure.

Throw in ‘technology’ and it makes Wall Street puff up that little bit more.

The fact that virtually no analyst or serious buyer of stocks has the first idea of what’s involved in such a transformation is irrelevant. They will lap it up.

This is how capitalism works, and it indisputably results in the most efficient allocation of resources possible.

A Dash of Layoffs, a Sprinkling of Talent

These analysts and buyers will assume there will be reductions to employee headcount sooner rather than later, which of course will make the transformation go faster and beat a quick path to profit.

Hires of top ‘industry experts’ who know the magic needed to get all this done, and who will be able to pass on their wisdom without friction to the eager staff that remain, will make this a sure thing.

In the end, of course, you don’t want to come out of this looking too bad, do you?

So how best to minimise any fallout from this endeavour?

Leadership

The first thing you should do is sort out the leadership of this transformation.

Hire in a senior executive specifically for the purpose of making this transformation happen.

Well, taking responsibility for it, at least. This will be useful later when you need a scapegoat for failure.

Ideally it will be someone with a long resume of similar transformational senior roles at different global enterprises.

Don’t be concerned with whether those previous roles actually resulted in any lasting change or business success; that’s not the point. The point is that they have a lot of experience with this kind of role, and will know how to be the patsy. Or you can get someone that has Dunning-Kruger syndrome so they can truly inhabit the role.

The kind of leader you want.

Make sure this executive is adept at managing his (also hired-in) subordinates in a divide-and-conquer way, so their aims are never aligned, or multiply-aligned in diverse directions in a 4-dimensional ball of wool.

Incentivise senior leadership to grow their teams rather than fulfil the overall goal of the program (ideally, the overall goal will never be clearly stated by anyone – see Strategy, below).

Change your CIO halfway through the transformation. The resulting confusion and political changes of direction will ensure millions are lost as both teams and leadership chop and change positions.

With a bit of luck, there’ll be so little direction that the core business can be unaffected.

Strategy

This second one is easy enough. Don’t have a strategy. Then you can chop and change plans as you go without any kind of overall direction, ensuring (along with the leadership anarchy above) that nothing will ever get done.

Unfortunately, the world is not sympathetic to this reality, so you will have to pretend to have a strategy, at the very least. Make the core PowerPoint really dense and opaque. Include as many buzzwords as possible – if enough are included people will assume you know what you are doing. It helps if the buzzwords directly contradict the content of the strategy documents.

It’s also essential that the strategy makes no mention of the ‘customer’, or whatever provides Vandelay’s revenue, or why the changes proposed make any difference to the business at all. That will help nicely reduce any sense of urgency to the whole process.

Try to make any stated strategy:

  • hopelessly optimistic (set ridiculous and arbitrary deadlines)
  • internally contradictory (eg tight yearly budget cycles partnered with agile development)
  • inflexible from the start (aka ‘my way, or the highway’)

Whatever strategy you pretend to pursue, be sure to make it ‘Go big, go early’, so you can waste as much money as fast as possible. Don’t waste precious time learning about how change can get done in your context. Remember, this needs to fail once you’re gone.

Technology Architecture

First, set up a completely greenfield ‘Transformation Team’ separate from your existing staff. Then, task them with solving every possible problem in your business at once. Throw in some that don’t exist yet too, if you like! Force them to coordinate tightly with every other team and fulfil all their wishes.

Ensure your security and control functions are separated from (and, ideally, in some kind of war with) a Transformation Team that is siloed as far as possible from the mainstream of the business. This will create the perfect environment for expensive white elephants to be built that no-one will use.

All this taken together will ensure that the Transformation Team’s plans have as little chance of getting to production as possible. Don’t give security and control functions any responsibility or reward for delivery, just reward them for blocking change.

Ignore the ‘decagon of despair’. These things are nothing to do with Transformation, they are just blockers people like to talk about. The official line is that hiring Talent (see below) will take care of those. It’s easy to exploit an organisation’s insecurity about its capabilities to downplay the importance of these

The decagon of despair.

Talent

Hire hundreds of very expensive engineers and architects who don’t understand the business context. Do this before you’ve even established a clear architecture (which will never be defined) for your overall goals (which are never clearly articulated).

Give these employees no clear leadership, and encourage them to argue with each other (and everyone else, should they happen to come across them) about minor academic details of software development and delivery, thus ensuring that no actual delivery is in danger of happening.

Just let them get on with it.

Endgame

If all goes to plan, the initiative peaks at around 18 months in. The plan is in full swing and analysts are expecting benefits to show in the bottom line in the upcoming reports. Fortunately, you’ve done the groundwork, and internally, everyone can see it’s a mess.

People are starting to ask questions about the lack of results. The promised benefits have not arrived, and costs seem to be spiralling out of control. The faction for change you encouraged is now on the defensive in senior meetings, and the cultural immune system of the old guard is kicking in again, reasserting its control.

It’s now time for you to protest that everything is going to plan, but gracefully accept your fate and your juicy payoff. If you’ve still not got enough cash to be happy, then you can go to Landervay Industries, and use your hard-won experience there to help them turn their business around. Maybe this time it will work, as your main competition (Vandelay) seems to be struggling since you left…

With luck, we all retire, even Dons.

Useful Resources

Generate strategy statement without any effort: https://strategy-madlibs.herokuapp.com/

Cloud Native Transformation patterns to avoid: http://www.cnpatterns.org/patterns-library

Disclaimer

None of this happened in real life. Any relation to any enterprises or technology transformations existing or cancelled is entirely accidental.


If you enjoyed this, then please consider buying me a coffee to encourage me to do more.


When Should I Interrupt Someone?

How many times have you sat there trying to work through a technical problem, and thought:

Is it OK if I interrupt someone else to get them to help me?

Pretty much every engineer ever

Since I work with companies that are in the process of moving to Cloud Native technologies, there is often a huge gulf in knowledge and experience between the ‘early adopters’/’pioneers’ and the rest of the organisation.

Bridging that gap is a very costly process involving a combination of approaches such as formal training, technical mentoring, gentle cajoling, and internal documentation.

Very commonly, the more junior technical staff are very wary of interrupting their more senior colleagues, whose time is perceived as more valuable, and whose knowledge and experience can inhibit them from seeking help.

The Problem

Most of the time this isn’t a huge problem, as engineers negotiate between them when it’s OK to interrupt by observing how often others do it, developing good relationships with their peers, and so on.

It becomes a problem when people are unable to feel safe to interrupt others. This might be because:

  • They feel ‘left out’ of the team
  • They feel like they ‘should’ be able to solve the problem themselves
  • They think asking for help is a failure signal
  • They don’t want to “waste others’ time”

Of course, all of these reasons are related reasons to do with psychological safety, so often cited as a core characteristic of high-performing teams. This article can’t solve that problem, but seeks to help with one aspect of it. If you have rules around when and how it’s ‘OK’ to ask for help, it can make you safer about seeking it.

If people feel unable to ask for help, they can (at the worst extremes) sit there sweating for days making no progress, while feeling under enormous stress about their work. At the other end, you can get employees that ask for help after getting stuck immediately, wasting others’ time as they have to explain their problem to someone, and very often fixing the problem themselves as they talk.

The Rule of Thumb

Early in my career, the first consultancy I worked with had a really simple rule for this:

If you’re stuck for over an hour, seek help.

This beautifully simple rule works very well in most contexts. It stops people sitting on blockages for days, and stops them from jumping out of their seat early in a panic.

A further piece of advice which I add to this is:

When you seek advice, first write down everything you’ve tried.

This has at least three benefits:

  1. It acts as a form of rubber duck debugging. Very often, in the process of taking a step back and writing down what you’ve tried, you’ll see what you missed.
  2. When you go to get help, you have evidence that you’ve gone through some kind of structured thought process before raising the alarm, rather than just asking for help as soon as the going got tough.
  3. You will save time explaining the context to someone else you’ve forced to context switch.

An essay is not required. Just enough notes to explain clearly and concisely what problem you’re facing and what your thinking was about how to solve it.

The Formula

The rule of thumb is simple and useful, but there’s other factors to consider if you want to get really scientific about when and how it’s OK to interrupt others. If you’re in the business of knowledge work, every time you interrupt someone you reduce their efficiency, and cost your business money.

Bosses are notorious for being cavalier with their inferiors’ time, but there’s often a good justification for this: their time is worth more to the business than yours.

So I came up with a formula for this, embodied in this spreadsheet.

The formula takes in a few parameters:

  • ‘Time taken thus far’ (ie how much time you’ve spent stuck on the problem) (“T3F”)
  • Time it will take to explain to someone else (“T3E”)
  • The ‘interruption overhead’ to the interruptee (“IO”)
  • The relative worth of your time and the interruptee’s time (“RTW”)

and tells you whether it’s ok to interrupt, as well as how much time you should still spend looking at it before interrupting. The interesting extra parameter here is the ‘relative cost’ of your time to the interruptee’s. This will be difficult to estimate accurately, but it can be set by the more senior staff as a guide to when they want to get involved in a problem. The last thing a more senior engineer should want is for their juniors to be spending significant amounts of time neither solving the problem nor developing their knowledge and capabilities.

The formula, for those interested is:

Interrupt if:

T3F > RTW (IO + T3E)

If you use it, let me know!

If you enjoyed this, then please consider buying me a coffee to encourage me to do more.


An Incompetent Newbie Takes Up 3D Printing

Like many self-confessed geeks, I’ve long been curious about 3d-printing. To me, it sounds like the romantic early days of home computing in the 70s, where expensive machines that easily broke and were used as toys gradually gave way to more reliable and useful devices that became mainstream twenty years later.

The combination of a few factors led me to want to give it a go: needing a hobby in lockdown; teenage kids who might take to it (and were definitely interested); a colleague who had more experience with it; and the continuing drop in prices and relative maturity of the machines.

Going into this, I knew nothing about the details or the initial difficulties, so I wanted to blog about it before I forget about them or think that they are ‘obvious’ to everyone else. Plenty wasn’t obvious to me…

Reading Up And Choosing A Printer

I started by trying to do research on what kind of printer I wanted, and quickly got lost in a sea of technical terms I didn’t understand, and conflicting advice on forums. The choices were utterly bewildering, so I turned to my colleague for advice. The gist of what he told me was: ‘Just pick one you can afford that seems popular and go for it. You will learn as you go. Be prepared for it to break and be a general PITA.’

So I took his advice. I read somewhere that resin printers were far more detailed, and got advice from another former colleague on the reputable brands, held my nose and dove in. I plumped for the Elegoo Mars 2, as it was one of the recommendations, and it arrived a few days later, along with a bottle of resin. Machine + resin was about £230.

Setup

I won’t say setup was a breeze, but I imagine it was a lot slicker than it was in the really early days of home 3d printing. I didn’t have to construct the entire thing, and the build quality looked good to me.

The major difficulties I had during setup were:

  • Not realising I needed to wash the print in IPA (Isopropyl Alcohol, 90%+), surgical gloves (washing up gloves won’t cut it), and a mask. The people that inhabit 3d printing forums seemed to think it was trivial to get hold of gallons of it from local hardware stores, but all I could find was a surprisingly expensive 250ml bottle for £10 in a local hardware shop (the third I tried). Three pairs of gloves are supplied
  • Cack-handedly dropping a screw into the resin vat (not recommended) and having to fish it out.
  • Not following the instructions on ‘levelling the plate’ (the print starts by sticking resin to the metal printing plate, so it has to be very accurately positioned) to the absolute letter. The instructions weren’t written by a native speaker and also weren’t clearly laid out (that’s my excuse).

I also wasn’t aware that 3d-printing liquid resin is an unsafe substance (hence the gloves and mask), and that the 3d printing process produces quite a strong smell. My wife wasn’t particularly happy about this news, so I then did a lot of research to work out how to ensure it was safe. This was also bewildering, as you get everything from health horror stories to “it’s fine” reassurance.

In the event it seems like it’s fine, as long as you keep a window open whenever the printing lid is off and for a decent time after (30 mins+). It helps if you don’t print all day every day. The smelliest thing is the IPA, which isn’t as toxic as the resin, so as long as you keep the lid on wherever possible any danger is significantly reduced. If you do the odd print every other day, it’s pretty safe as far as I can tell. (This is not medical advice: IANAD). A far greater risk, it seems, is getting resin on your hands.

Thankfully also, the smell is not that unpleasant. It’s apparently the same as a ‘new car’ smell (which, by the way, is apparently horrifyingly toxic – I’ll always be opening a window when I’m in a new car in future).

Unlike the early days of computing, we have youtube, and I thoroughly recommend watching videos of setups before embarking on it yourself.

Finally, resin disposal is something you should be careful about. It’s irresponsible to pour resin down the drain, so don’t do it. Resin hardens in UV light (that’s how the curing/hardening process works), so there’s plenty of advice on how to dispose of it safely.

First Print

The first prints (which come on the supplied USB stick) worked first time, which was a huge relief. (Again, online horror stories of failed machines abound.)

The prints themselves were great little pieces themselves, a so-called ‘torture test’ for the printer to put it through its paces. A pair of rooks with intricate staircases inside and minute but legible lettering. The kids immediately claimed them as soon as I’d washed them in alcohol and water, before I had the time to properly cure them.

I didn’t know what curing was at the time, and had just read that it was a required part of the process. I was confused because I’d read it was a UV process, but since the machine worked by UV I figured that the capability to cure came with the machine. Wrong! So I’d need a source of UV light, which I figured daylight would provide.

I tried leaving the pieces outside for a few hours, but I had no idea when they would be considered done, or even ‘over-cured’, which is apparently a thing. In the end I caved and bought a curing machine for £60 that gave me peace of mind.

From here I printed something for the kids. The first print proper:

Darth Buddha, First Print for my Kids

I’d decided to ‘hollow out’ this figure, to reduce the cost of the resin. I think it was hollowed to 2mm, and worked out pretty well. One downside was that the base came away slightly at the bottom, suggesting I’d hollowed it out too much. In any case, the final result has pride of place next to the Xbox.

More Prints

Next was for me, an Escher painting I particularly like (supposedly the figure in the reality/gallery world is Wittgenstein):

MC Escher’s ‘Print Gallery’ Etched in 3-D

You can see that there are whiter, chalkier bits. I think this is something to do with some kind of failure in my washing/curing process combined with the delicacy of the print, but I haven’t worked out what yet.

And one for my daughter (she’s into Death Note):

And another for me – a 3D map of the City of London:

A 3D Map of the City of London

The Paraphernalia Spreads…

Another echo of the golden age of home computing is the way the paraphernalia around the machine gradually grows. The ‘lab’ quickly started to look like this:

The Paraphernalia Spreads…

Alongside the machine itself, you can also see the tray, tissue paper, bottles (IPA and resin), curing station, gloves, masks, tools, various tupperware containers, and a USB stick.

It helps if you have a garage, or somewhere to spread out to that other people don’t use during the day.


If you like this, you might like one of my books:
Learn Bash the Hard Way

Learn Git the Hard Way
Learn Terraform the Hard Way

LearnGitBashandTerraformtheHardWay
Buy in a bundle here

Disaster One

After a failed print (an elephant phone holder for my mother), which sagged halfway through on the plate, the subsequent attempts to print were marked by what sounded like a grinding noise of the plate against the resin vat. It was as though the plate tried to keep going through the vat to the floor of the machine.

I looked up this problem online, and found all sorts of potential causes, and no easy fix. Some fixes talked about attaching ‘spacers’ (?) to some obscure part of the machine. Others talked about upgrading the firmware, and even a ‘factory’. Frustrated with this, I left it alone for a couple of weeks. After re-levelling the plate a couple of times (a PITA, as the vat needed to be carefully removes, gloves and mask on etc), it occurred to me one morning that maybe some hardened material had fallen into the resin vat and that that was what the plate was ‘grinding’ on.

I drained the vat, which was a royal PITA the first time I did it, as my ineptitude resulted in spilled resin due to the mismatch between bottle size and resin filter (the supplied little resin jug is also way to small for purpose). But it was successful, as there were bits caught in the filter, and after re-filling the vat I was happily printing again.

Disaster Two

Excited that I hadn’t spent well north of £200 on a white elephant, I went to print another few things. Now the prints were failing to attach to the plate, meaning that nothing was being printed at all. A little research again, and another draining of the vat later I realised the problem: the plate hadn’t attached to the print, but the base of the print had attached to the film at the bottom of the vat. This must be a common problem, as a plastic wedge is provided for exactly this purpose. It wasn’t too difficult to prise the flat hardened piece of resin off the floor of the vat and get going again.

Talking to my colleague I was told that ‘two early disasters overcome is pretty good going so far’ for 3d printing.

We’re Back

So I was back in business. And I could get back to my original intention to print architectural wonders (history of architecture is an interest of mine). Here’s a nice one of Notre Dame.

Conclusion

When 3d printing works, it’s a joy. There is something magical about creating something so refined out of a smelly liquid.

When it doesn’t work it’s very frustrating. Like speculating on shares, I would only spend money on it you can afford to lose. And like any kind of building, don’t expect the spending to stop on the initial materials.

I think this is the closest I’ll get to the feeling of having one of these in 1975 (the year I was born).

The Altair 8800 Home PC

It’s also fun to speculate on what home 3d printing will look like in 45 years…

GitOps Decisions

GitOps is the latest hotness in the software delivery space, following (and extending) on older trends such as DevOps, infrastructure as code, and CI/CD.

So you’ve read up on GitOps, you’re bought in to it, and you decide to roll it out.

This is where the fun starts. While the benefits of GitOps are very easy to identify:

  • Fully audited changes for free
  • Continuous integration and delivery
  • Better control over change management
  • The possibility of replacing the joys of ServiceNow with pull requests

the reality is that constructing your GitOps pipelines is far from trivial, and involves many big and small decisions that add up to a lot of work to implement as you potentially chop and change as you go. We at Container Solutions call this ‘GitOps Architecture’ and it can result in real challenges in implementation.

The good news is that with a bit of planning and experience you can significantly reduce the pain involved in the transition to a GitOps delivery paradigm.

In this article, I want to illustrate some of these challenges by telling the story of a company that adopts GitOps as a small scrappy startup, and grows to a regulated multinational enterprise. While such accelerated growth is rare, it does reflect the experience of many teams in larger organisations as they move from proof of concept, to minimum viable product, to mature system.

‘Naive’ Startup

If you’re just starting out, the simplest thing to do is create a single Git repository with all your needed code in it. This might include:

  • Application code
  • A Dockerfile, to build the application image
  • Some CI/CD pipeline code (eg GitLab CI/CD, or GitHub Actions)
  • Terraform code to provision resources needed to run the application
  • All changes directly made to master, changes go straight to live

The main benefits of this approach are that you have a single point of reference, and tight integration of all your code. If all your developers are fully trusted, and shipping speed is everything then this might work for a while.

Unfortunately, pretty quickly the downsides of this approach start to show as your business starts to grow.

First, the ballooning size of the repository as more and more code gets added can result in confusion among engineers as they come across more clashes between their changes. If the team grows significantly, then a lot of rebasing and merging can result in confusion and frustration.

Second, you can run into difficulties if you need to separate control or cadence of pipeline runs. Sometimes you just want to quickly test a change to the code, not deploy to live, or do a complete build and run of the end-to-end delivery.

Increasingly the monolithic aspect of this approach creates more and more problems that need to be worked on, potentially impacting others’ work as these changes are worked through.

Third, as you grow you may want more fine-grained responsibility boundaries between engineers and/or teams. While this can be achieved with a single repo (newer features like CODEOWNERS files can make this pretty sophisticated), a repository is often a clearer and cleaner boundary.

Repository Separation

It’s getting heavy. Pipelines are crowded and merges are becoming painful. Your teams are separating and specialising in terms of their responsibility.

So you decide to separate repositories out. This is where you’re first faced with a mountain of decisions to make. What is the right level of separation for repositories? Do you have one repository for application code? Seems sensible, right? And include the Docker build stuff in there with it? Well, there’s not much point separating that.

What about all the team Terraform code? Should that be in one new repository? That sounds sensible. But, oh: the newly-created central ‘platform’ team wants to control access to the core IAM rule definitions in AWS, and the teams’ RDS provisioning code is in there as well, which the development team want to regularly tweak.

So you decide to separate out the Terraform out into two repos: a ‘platform’ one and an ‘application-specific’ one. This creates another challenge, as you now need to separate out the Terraform state files. Not an insurmountable problem, but this isn’t the fast feature delivery you’re used to, so your product manager is now going to have to explain why feature requests are taking longer than previously because of these shenanigans. Maybe you should have thought about this more in advance…

Unfortunately there’s no established best practice or patterns for these GitOps decisions yet. Even if there were, people love to argue about them anyway, so getting consensus may still be difficult.

The problems of separation don’t end there. Whereas before, co-ordination between components of the build within the pipeline were trivial, as everything was co-located, now you have to orchestrate information flow between repositories. For example, when a new Docker image is built, this may need to trigger a deployment in a centralised platform repository along with passing over the new image name as part of that trigger.

Again, these are not insurmountable engineering challenges, but they’re easier to implement earlier on in the construction of your GitOps pipeline when you have space to experiment than later on when you don’t.

OK, your business is growing, and you’re building more and more applications and services. It increasingly becomes clear that you need some kind of consistency in structure in terms of how applications are built and deployed. The central platform team tries to start enforcing these standards. Now you get pushback from the development teams who say they were promised more autonomy and control than they had in the ‘bad old days’ of centralised IT before DevOps and GitOps.

If these kind of challenges ring bells in readers’ heads it may be because there is an analogy here between GitOps and monolith vs microservices arguments in the application architecture space. Just as you see in those arguments, the tension between distributed and centralised responsibility rears its head more and more as the system matures and grows in size and scope.

On one level, your GitOps flow is just like any other distributed system where poking one part of it may have effects not clearly understood, if you don’t design it well.


If you like this, you might like my book Learn Git the Hard Way


Environments

At about the same time as you decide to separate repositories, you realise that you need a consistent way to manage different deployment environments. Going straight to live no longer cuts it, as a series of outages has helped birth a QA team who want to test changes before they go out.

Now you need to specify a different Docker tag for your application in ‘test’ and ‘QA’ environments. You might also want different instance sizes or replication features enabled in different environments. How do you manage the configuration of these different environments in source? A naive way to do this might be to have a separate Git repository per environment (eg superapp-dev, super-app-qa, super-app-live).

Separating repositories has the ‘clear separation’ benefit that we saw with dividing up the Terraform code above. However, few end up liking this solution, as it can require a level of Git knowledge and discipline most teams don’t have in order to port changes between repositories with potentially differing histories. There will necessarily be a lot of duplicated code between the repositories, and – over time – potentially a lot of drift too.

If you want to keep things to a single repo you have (at least) three options:

  • A directory per environment
  • A branch per environment
  • A tag per environment

Sync Step Choices

If you rely heavily on a YAML generator or templating tool, then you will likely be nudged more towards one or other choice. Kustomize, for example, strongly encourages a directory-based separation of environments. If you’re using raw yaml, then a branch or tagging approach might make you more comfortable. If you have experience with your CI tool in using one or other approach previously in your operations, then you are more likely to prefer that approach. Whichever choice you make, prepare yourself for much angst and discussion about whether you’ve chosen the right path.

Runtime Environment Granularity

Also on the subject of runtime environments, there are choices to be made on what level of separation you want. On the cluster level, if you’re using Kubernetes, you can choose between:

  • One cluster to rule them all
  • A cluster per environment
  • A cluster per team

At one extreme, you can put all your environments into one cluster. Usually, there is at least a separate cluster for production in most organisations.

Once you’ve figured out your cluster policy, at the namespace level, you can still choose between:

  • A namespace per environment
  • A namespace per application/service
  • A namespace per engineer
  • A namespace per build

Platform teams often start with a ‘dev’, ‘test’, ‘prod’ namespace setup, before realising they want more granular separation of teams’ work.

You can also mix and match these options, for example offering each engineer their own namespace for ‘desk testing’, as well as a namespace per team if you want.

Conclusion

We’ve only scratched the surface here of the areas of decision-making required to get a mature GitOps flow going. You might also consider RBAC/IAM and onboarding, for example, an absolute requirement if you grow to become that multinational enterprise.

Often rolling out GitOps can feel like a lot of front-loaded work and investment, until you realise that before you did this none of it was encoded at all. Before GitOps, chaos and delays ensued as no-one could be sure in what state anything was, or should be. These resulted in secondary costs as auditors did spot checks and outages caused by unexpected and unrecorded changes occupied your most expensive employees’ attention. As you mature your GitOps flow, the benefits multiply, and your process takes care of many of these challenges. But more often than not, you are under pressure to demonstrate success more quickly than you can build a stable framework.

The biggest challenge with GitOps right now is that there are no established patterns to guide you in your choices. As consultants, we’re often acting as sherpas, guiding teams towards finding the best solutions for them and nudging them in certain directions based on our experience.

What I’ve observed, though, is that choices avoided early on because they seem ‘too complicated’ are often regretted later. But I don’t want to say that that means you should jump straight to a namespace per build, and a Kubernetes cluster per team, for two reasons.

1) Every time you add complexity to your GitOps architecture, you will end up adding to the cost and time to deliver a working GitOps solution.

2) You might genuinely never need that setup anyway.

Until we have genuine standards in this space, getting your GitOps architecture right will always be an art rather than a science.