Is it Imperative to be Declarative?

Recently, in Container Solutions’ engineering Slack channel, a heated argument ensued amongst our engineers after a Pulumi-related story was posted. I won’t recount the hundreds of posts in the thread, but the first response was “I still don’t know why we still use Terraform”, followed by a still-unresolved ping-pong debate about whether Pulumi is declarative or imperative, followed by another debate about whether any of this imperative vs declarative stuff really matters at all, and why can’t we just use Pulumi please?

This article is my attempt to prove that I was right and everyone else was wrong calmly lay out some of the issues and help you understand both what’s going on and how to respond to your advantage when someone says your favoured tool is not declarative and therefore verboten.

What does declarative mean, exactly?

Answering this question is harder than it appears, as the formal use of the term can vary from the informal use within the industry. So we need to unpick first the formal definition, then look at how the term is used in practice.

The formal definition

Let’s start with the Wikipedia definition of declarative:

“In computer science, declarative programming is a programming paradigm—a style of building the structure and elements of computer programs—that expresses the logic of a computation without describing its control flow.”

This can be reduced to:

“Declarative programming expresses the logic of a computation without describing its control flow.”

This immediately begs the question: ‘what is control flow?’ Back to Wikipedia:

“In computer science, control flow (or flow of control) is the order in which individual statements, instructions or function calls of an imperative program are executed or evaluated. Within an imperative programming language, a control flow statement is a statement that results in a choice being made as to which of two or more paths to follow.”

This can be reduced to:

“Imperative programs make a choice about what code is to be run.”

According to Wikipedia, examples of control flow include if statements, loops, and indeed any other construct that allows changes which statement is to be performed next (e.g. jumps, subroutines, coroutines, continuations, halts).

Informal usage and definitions

In debates around tooling, people rarely stick closely to the formal definitions of declarative and imperative code. The most commonly heard informal definition saw heard is: “Declarative code tells you what to do, imperative code says how to do it”. It sounds definitive, but discussion about it quickly devolves into definitions of what ‘what’ means and what ‘how’ means.

Any program tells you ‘what’ to do, so that’s potentially misleading, but one interpretation of that is that it describes the state you want to achieve.

For example, by that definition, is this pseudo-code declarative or imperative?

if exists(ec2_instance_1):
  create(ec2_instance_2)
create(ec2_instance_1)

Firstly, strictly speaking, it’s definitely not declarative according to a formal definition, as the second line may or may not run, so there’s control flow there.

It’s definitely not idempotent, as running once does not necessarily result in the same outcome as running twice. But an argument put to me was: “The outcome does not change because someone presses the button multiple times”, some sort of ‘eventually idempotent’ concept. Indeed, a later clarification was: “Declarative means for me: state eventually consistent”.

It’s not just engineers in the field who don’t cling to the formal definition. This Jenkinsfile documentation describes the use of conditional constructs whilst calling itself declarative.

So far we can say that:

  • The formal definitions of imperative vs declarative are pretty clear
  • In practice and general discussion, people get a bit confused about what it means and/or don’t care about the formal definition

Are there degrees of declarativeness?

In theory, no. In practice, yes. Let me explain.

What is the most declarative programming language you can think of? Whichever one it is, it’s likely that either there is a way to make it (technically) imperative, or it is often described as “not a programming language”.

HTML is so declarative that a) people often deride it as “not a programming language at all”, and b) we had to create the JavaScript monster and the SCRIPT tag to ‘escape’ it and make it useful for more than just markup. This applies to all pure markup languages. Another oft-cited example is Prolog, which has loops, conditions, and a halt command, so is technically not declarative at all.

SQL is to many a canonical declarative language: you describe what data you want, and the database management system (DBMS) determines how that data is retrieved. But even with SQL you can construct conditionals:

insert into table1 
where exists (
  select 1
  from table2
  where "some value" == table2.column1
)

Copy

The insert to table1 will only run conditionally, i.e. if there’s a row in table two that matches the text “some value”. You might think that this is a contrived example, and I won’t disagree. But in a way this backs up my central argument: whatever the technical definition of declarative is, the difference between most languages in this respect is how easy or natural it is to turn them into imperative languages.

Now consider this YAML, yanked from the internet:

job:
  script: "echo Hello, Rules!"
  rules:
    - if: '$CI_MERGE_REQUEST_TARGET_BRANCH_NAME == "master"'
      when: always
    - if: '$VAR =~ /pattern/'
      when: manual
	- when: on_success

This is clearly effectively imperative code. It runs in an order from top to bottom, and has conditionals. It can run different instructions at different times, depending on the context it is run in. However, YAML itself is still declarative. And because YAML is declarative, we have the hell of Helmkustomize, and different devops pipeline languages that claim to be declarative (but clearly aren’t) to deal with, because we need imperative, dynamic, conditional, branching ways to express what we want to happen.

It’s this tension between the declarative nature of the core tool and our immediate needs to solve problems that creates the perverse outcomes we hate so much as engineers, where we want to ‘break out’ of the declarative tool in order to get the things we want done in the way that we want it done.

Terraform and Pulumi

Which brings us neatly to the original subject of the Slack discussion we had at Container Solutions.

Anyone who has used Terraform for any length of time in the field has probably gone through two phases. First, they marvel at how the declarative nature of it makes it in many ways easier to maintain and reason about. And second, after some time using it, and as complexity in the use case builds and builds, they increasingly wish they could have access to imperative constructs.

It wasn’t long before Hashicorp responded to these demands and introduced the ‘count’ meta-argument, which effectively gave us some kind of loop concept, and hideous bits of code like this abound to give us if statements by the back door:

count = var.something_to_do ? 1 : 0

There’s also for and for_each constructs, and the local-exec provisioner, which allows you to escape any declarative shackles completely and just drop to the (decidedly non-declarative) shell once the resource is provisioned.

It’s often argued that Pulumi is not declarative, and despite protestations to the contrary, if you are using it for its main selling point (that you can use your preferred imperative language to declare your desired state), then Pulumi is effectively an imperative tool. If you talk to the declarative engine under Pulumi’s hood in YAML, then you are declarative all the way down (and more declarative than Terraform, for sure).

The point here is that not being purely declarative is no bad thing, as it may be that your use case demands a more imperative language to generate a state representation. Under the hood, that state representation describes the ‘what’ you want to do, and the Pulumi engine figures out how to achieve that for you.

Some of us at Container Solutions worked some years ago at a major institution that built a large-scale project in Terraform. For various reasons, Terraform was ditched in favour of a python-based boto3 solution, and one of those reasons was that the restrictions of a more declarative language produced more friction than the benefits gained. In other words, more control over the flow was needed. It may be that Pulumi was the tool we needed: A ‘Goldilocks’ tool that was the right blend of imperative and declarative for the job at hand. It could have saved us writing a lot of boto3 code, for sure.

How to respond to ‘but it’s not declarative!’ arguments

Hopefully reading this article has helped clarify the fog around declarative vs imperative arguments. First, we can recognise that purely declarative languages are rare, and even those that exist are often contorted into effectively imperative tooling. Second, the differences between these tools is how easy or natural they make that contortion.

There are good reasons to make it difficult for people to be imperative. Setting up simple Kubernetes clusters can be a more repeatable and portable process due to its declarative configuration. When things get more complex, you have to reach for tools like Helm and kustomize which may make you feel like your life has been made more difficult.

WIth this more nuanced understanding, next time someone uses the “but it’s not declarative” argument to shut you down, you can tell them two things: That that statement is not enough to win the debate; and that their suggested alternative is likely either not declarative, or not useful. The important question is not: “Is it declarative?” but rather: ‘How declarative do we need it to be?”


This article was originally published on Container Solutions’ blog and is reproduced here by permission.


If you like this, you might like one of my books:
Learn Bash the Hard Way

Learn Git the Hard Way
Learn Terraform the Hard Way

LearnGitBashandTerraformtheHardWay
Buy in a bundle here

If you enjoyed this, then please consider buying me a coffee to encourage me to do more.

The Biggest Cloud Native Strategy Mistake

Business strategy is very easy to get wrong. You’re trying to make sure your resources and assets are efficiently deployed and focussed on your end goal, and that’s hard. There’s no magic bullet that can help you both get the right strategy defined, and then successfully deliver on it, but there are many resources we’ve found that can help reduce the risk of failure. 

Under the heading ‘The Future of Cloud’, Gartner recently ran a symposium for CIOs and IT executives including much discussion about strategies relating to Cloud and Cloud Native trends. At least two of the main talks (onetwo) were centred around a five-year horizon, discussing where cloud adoption will be in 2027 compared to where it is now.

As part of those talks, Gartner referenced a useful pyramid-shaped visualisation of different stages of cloud adoption. It could be viewed as a more schematic version of our Maturity Matrix, which we use as part of our Cloud Native Assessments with clients. 

In this article, we’re going to use the Gartner visualisation to talk about one of the biggest mistakes made in Cloud Native transformation strategies.

The pyramid

Gartner’s pyramid depicts four stages of cloud adoption from the perspective of business scope. These stages are shown as a hierarchy where the bottom layer represents the lowest dependency (“Technology Disruptor”) and the top layer represents the highest level business goal (“Business Disruptor”).

The four stages can be briefly summarised as:

  • Cloud As Technology Disruptor
    • The new base technology is adopted. For example, containerised applications, or a move to using a cloud service provider instead of a data centre.
  • Cloud As Capability Enabler
    • Now you have new technology in place, you can more easily build capabilities that may have been more difficult to achieve before, such as automated testing, or CI/CD.
  • Cloud As Innovation Facilitator
    • With new capabilities, you have the right environment to foster innovation in your business. This means you might, for example, leverage your cloud platform to deliver features more quickly, or conduct A/B testing of new features to maximise your return on investment.
  • Cloud As Business Disruptor
    • The most advanced stage of cloud adoption, where you can use the previous three stages’ outputs to change your business model by, for example, migrating to a SaaS model, or scaling your client base significantly, or introducing an entirely new product line.

The Biggest Cloud Native Strategy Mistake -blog illustration 1Whilst it is somewhat higher level, this pyramid is similar to our Maturity Matrix in that it helps give you a common visual reference point for a comprehensible and tangible view of both where you are, and where you are trying to get to, in your Cloud Native program. For example, it can help in discussions with technologists to ask them how the changes they are planning relate to stage four. Similarly, when talking to senior leaders about stage four, it can help to clarify whether they and their organisation have thought about the various dependencies below their goal and how they relate to each other. 

It can also help you avoid the biggest Cloud Native strategy mistake.

The big mistake

The biggest anti-pattern we see when consulting on Cloud Native strategy is to conflate all four phases of the pyramid into one monolithic entity. This means that participants in strategic discussions treat all four stages as a single stage, and make their plans based on that.

This anti-pattern can be seen at both ends of the organisational spectrum. Technologists, for example, might focus on the technical challenges, and are often minded to consider cloud strategy as simply a matter of technology adoption, or even just technology choice and installation. Similarly, business leaders often see a successful Cloud Native transformation as starting and stopping with a single discrete technical program of work rather than an overlapping set of capabilities that the business needs to build in its own context.

This monolithic strategy also conflates the goals of the strategy with the adoption plan, which in turn can lead to a tacit assumption that the whole program should be outlined in a single static and unchanging document.

For example, a business might document that their ‘move to the cloud’ is being pursued in order to transition their product from a customer installation model to a more scalable SaaS model. This would be the high-level vision for the program, the ‘level four’ of the pyramid. In the same document, there might be a roadmap which sets out how the other three levels will be implemented. For example, it might outline which cloud service provider will be used, which of those cloud service provider’s services will be consumed, which technology will be used as an application platform, and what technologies will be used for continuous integration and delivery.

This mixing of the high-level vision with the adoption plan risks them being treated as a single task to be completed. In reality, the vision and adoption plan should be separated, as while it is important to have clarity and consistency of vision, the adoption plan can change significantly as the other three levels of the pyramid are worked through, and this should be acknowledged as part of the overall strategy. At Container Solutions we call this ‘dynamic strategy’: a recognition that the adoption plan can be iterative and change as the particular needs and capabilities of your business interact with the different stages.

The interacting stages and ‘organisational indigestion’

Let’s dig a little bit deeper into each phase.

In the first ‘Technology Disruptor’ phase, there is uncertainty about how fast the technology  teams can adopt new technologies. This can depend on numerous local factors such as the level of experience and knowledge of these technologies among your teams, their willingness to take risks to deliver (or even deliver at all), and external blocks on delivery (such as security or testing concerns). It should also be said that whilst skills shortages are often cited as blocking new technology adoption, it is no longer practical to think of skills as a fixed thing that is hired as part of building a team to run a project based on a specific technology. Rather, technology skills need to be continuously developed by teams of developers exploring new technologies as they emerge and mature. To support this, leaders need to foster a “learning organisation” culture, where new ideas are explored and shared routinely.

The second ‘Capability Enabler’ phase has a basic dependency on the ‘Technology Disruptor’ phase. If those dependencies are not managed well, then organisational challenges may result. For example, whilst CI/CD capabilities can be built independently of the underlying technology, its final form will be determined by its technological enablers. A large-scale effort to implement Jenkins pipelines across an organisation may have to be scrapped and reworked if the business decides that AWS-native services should be used, and therefore the mandated tool for CI is AWS CodePipeline. This conflict between the ‘Technology Disruptor’ phase (the preference for AWS-native services) and ‘Capability Enabler’ phases can be seen as ‘organisational indigestion’ that can cause wasted time and effort as contradictions in execution are worked out.

The third ‘Innovation Facilitator’ phase is also dependent on the lower phases, as an innovation-enabling cloud platform is built for the business. Such a platform (or platforms) cannot be built without the core capabilities being enabled through the lower phases.

In practice, the three base phases can significantly overlap with one another, and could theoretically be built in parallel. However, ignoring the separability of the phases can result in the ‘organisational indigestion’ mentioned above, as the higher phases need to be revisited if the lower phases change. To give another simple example: if a business starts building a deployment platform on AWS CodeDeploy, it would need to be scrapped if the lower level decides to use Kubernetes services on Google Cloud.

The wasted effort and noise caused by this ‘organisational indigestion’ can be better understood and managed through the four phases model.

The treatment of Cloud Native strategy adoption as a single static monolith can also help to downplay or ignore the organisational challenges that lie ahead for any business. One example of this might be that while implementing a Cloud Native approach to automated testing could be a straightforward matter of getting engineers to write tests that previously didn’t exist, or it could equally be a more protracted and difficult process of retraining a manual testing team to now program automated tests.

Finally, the monolithic approach can lead to a collective belief that the project can be completed in a relatively short period of time. What’s a reasonable length of time? It’s worth remembering that Netflix, the reference project for a Cloud Native transformation, took seven years to fully move from their data centre to AWS. And Netflix had several things in their favour that made their transformation easier to implement: a clear business need (they could not scale fast enough and were suffering outages); a much simpler cloud ecosystem; a product clarity (video streaming) that made success easy to define; and a lack of decades of legacy software to maintain while they were doing it.

What to do about it?

We’ve outlined some of the dangers that not being aware of the four stages can bring, so what can you do to protect yourself against them?

Be clear about your path on the pyramid – optimisation or transformation?

The first thing is to ensure you have clarity around what the high-level vision and end goals for the transformation are. Gartner encapsulates this in a train map metaphor, to prompt the question of what your journey is envisaged to be. The ‘Replacement’ path, which goes across the first ‘Technical Disruption’ can also encompass the classic ‘Lift and Shift’ journey, the ‘Cloud Native’ path might cross both the first and second phases, and the ‘Business Transformation’ journey can cross all four phases.

The ‘east-west’ journeys can be characterised as ‘optimisation’ journeys, while the ‘south-north’ journeys can be characterised as ‘transformation’ journeys.

If the desired journey is unclear, then there can be significant confusion between the various parties involved about what is being worked towards. For example, executives driving the transformation may see a ‘Replacement’ approach as sufficient to make a transformation and therefore represent a journey up the pyramid, whilst those more technologically minded will immediately see that such a journey is an ‘optimisation’ one going across the first phase.

The Biggest Cloud Native Strategy Mistake -blog illustration 2

This advice is summarised as the vision first Cloud Native pattern, with executive commitment also being relevant.

Vision fixed, adoption path dynamic

A monolithic strategy that encompasses both vision and adoption can result in a misplaced faith in some parties that the plan is clear, static, and linearly achieved. This faith can flounder when faced with the reality of trying to move an organisation across the different phases.

Each organisation is unique, and as it works through the phases the organisation itself changes as it builds its Cloud Native capabilities. This can have a recursive effect on the whole program as the different phases interact with each other and these changing capabilities.

You can help protect against the risk of a monolithic plan by separating your high-level vision from any adoption plan. Where the vision describes why the project is being undertaken and should be less subject to change, the adoption plan (or plans) describes how it should be done, and is more tactical and subject to change. In other words, adoption should follow the dynamic strategy pattern.

Start small and be patient

Given the need for a dynamic strategy, it’s important to remember that if you’re working on a transformation, you’re building an organisational capability rather than doing a simple installation or migration. Since organisational capability can’t be simply transplanted or bought in in a monolithic way, it’s advisable to follow the gradually raising the stakes pattern. This pattern advocates for exploratory experiments in order to cheaply build organisational knowledge and experience before raising the stakes. This ultimately leads up to commitment to the final big bet, but by this point risk of failure will have been reduced by the learnings gained from the earlier, cheaper bets.

As we’ve seen from the Netflix example, it can take a long time even for an organisation less encumbered by a long legacy to deliver on a Cloud Native vision. Patience is key, and a similar approach to organisational learning needs to be taken into account as you gradually onboard teams onto any cloud platform or service you create or curate.

Effective feedback loop

Since the strategy should be dynamic and organisational learning needs to be prioritised, it’s important that an effective and efficient feedback loop is created between all parties involved in the transformation process. This is harder to achieve than it might sound, as there is a ‘Goldilocks effect’ in any feedback loop: too much noise, and leaders get frustrated with the level of detail; too little, and middle management can get frustrated as the reality of delivering on the vision outlined by the leaders hits constraints from within and outside the project team. Similarly, those on the ground can get frustrated by either the perceived bureaucratic overhead of attending multiple meetings to explain and align decisions across departments, or the ‘organisational indigestion’ mentioned above when decisions at different levels conflict with each other and work must be scrapped or re-done.

Using the pyramid

The pyramid is an easily-understood way to visualise the different stages of cloud transformation. This can help align the various parties’ conception of what’s ahead and avoid the most often-seen strategic mistake when undergoing a transformation: the simplification of all stages into one static and pre-planned programme.

Cloud transformation is a complex and dynamic process. Whilst the vision and goals should not be subject to change, the adoption plan is likely to, as you learn how the changes you make to your technology expose further changes that need to be made to the business to support and maximise the benefits gained. It is therefore vital to separate the high level goals of your transformation from the implementation detail, and ensure there is an effective feedback loop

Through all this complexity, the pyramid can help you both conceptualise the vision for your transformation and define and refine the plan for adoption, allowing you to easily connect the more static high level goals to the details of delivery.

This article was originally published on Container Solutions’ blog and is reproduced here by permission.


If you like this, you might like one of my books:
Learn Bash the Hard Way

Learn Git the Hard Way
Learn Terraform the Hard Way

LearnGitBashandTerraformtheHardWay
Buy in a bundle here

If you enjoyed this, then please consider buying me a coffee to encourage me to do more.

Ian Miell

Practical Strategies for Implementing DevSecOps in Large Enterprises

At Container Solutions, we often work with large enterprises who are at various different stages of adopting cloud technologies. These companies are typically keen to adopt modern Cloud Native software working practices and technologies as itemised in our Maturity Matrix, so come to us for help, knowing that we’ve been through many of these transformation processes before. 

Financial services companies are especially keen to adopt DevSecOps, as the benefits to them are obvious given their regulatory constraints and security requirements. This article will focus on a common successful pattern of adoption for getting DevSecOps into large-scale enterprises that have these kinds of constraints on change.

DevSecOps and institutional inertia

The first common misconception about implementing DevSecOps is that it is primarily a technical challenge but, as we’ve explored on WTF before, it is at least as much about enabling effective communication. Whilst we have engineering skills in cutting-edge tooling and cloud services, there is little value in delivering a nifty technical solution if the business it’s delivered for is unable or unwilling to use it. If you read technical blog posts on the implementation of DevSecOps, you might be forgiven for thinking that the only things that matter are the tooling you choose, and how well you write and manage the software that is built on this tooling.

For organisations that were ‘born in the cloud’, where everyone is an engineer and has little legacy organisational scar tissue to consider, this could indeed be true. In such places, where the general approach to DevSecOps is well-grasped and agreed on by all parties, the only things to be fought over are indeed questions of tooling. This might be one reason why such debates take on an almost religious fervour.

The reality for larger enterprises that aren’t born in the cloud is that there are typically significant areas of institutional inertia to overcome. These include (but are not limited to):

  • The ‘frozen middle
  • Siloed teams that have limited capability in new technologies and processes
  • Internal policies and process designed for the existing ways of working

Prerequisites for success

Before outlining the pattern for success, it’s worth pointing out two critical prerequisites for enterprise change management success in moving to DevSecOps. As an aside, these prerequisites are not just applicable to DevSecOps but apply to most change initiatives.

The first is that the vision to move to a Cloud Native way of working must be clearly articulated to those tasked with delivering on it. The second is that the management who articulate the vision must have ‘bought into’ the change. This doesn’t mean they just give orders and timelines and then retreat to their offices, it means that they must back up the effort when needed with carrots, sticks, and direction when those under them are unsure how to proceed. If those at the top are not committed in this way, then those under them will certainly not push through and support the changes needed to make DevSecOps a success.

A three-phase approach

At Container Solutions we have found success in implementing DevSecOps in these contexts by taking a three-phase approach:

  1. Introduce tooling
  2. Baseline adoption
  3. Evolve towards an ideal DevSecOps practice

The alternative that this approach is put up against is the ‘build it right first time’ approach, where everything is conceived and delivered in one “big bang” style implementation. 

  1. Introduce tooling

In this phase you correlate the security team’s (probably manual) process with the automation tooling you have chosen, and determine their level of capability for automation. At this point you are not concerned with how closely the work being done now matches the end state you would like to reach. Indeed, you may need to compromise against your ideal state. For example, you might skip writing a full suite of tests for your policies.

The point of this initial phase is to create alignment on the technical direction between the different parties involved as quickly and effectively as possible. To repeat: this is a deliberate choice over technical purity, or speed of delivery of the whole endeavour.

The security team is often siloed from both the development and cloud transformation teams. This means that they will need to be persuaded, won over, trained, and coached to self-sufficiency.

Providing training to the staff at this point can greatly assist the process of adoption by emphasising the business’s commitment to the endeavour and setting a minimum baseline of knowledge for the security team. If the training takes place alongside practical implementation of the new skills learned, it makes it far more likely that the right value will be extracted from the training for the business.

The output of this phase should be that:

  • Security staff are comfortable with (at least some of) the new tooling
  • Staff are enthused about the possibilities offered by DevSecOps, and see its value
  • Staff want to continue and extend the efforts towards DevSecOps adoption
  1. Get To baseline adoption

Once you have gathered the information about the existing process, the next step is to automate them as far as possible without disrupting the existing process too much. For example, if security policy adherence is checked manually in a spreadsheet by the security team (not an uncommon occurrence), those steps can be replaced by automation. Tools that might be used for this include some combination of pipelines, Terraform, Inspec, and so on. The key point is to start to deliver benefits for the security team quickly and help them see that this will make them more productive and (most importantly of all) increase the level of confidence they have in their security process.

Again, the goal for this stage is to level up the capabilities of the security team so that the move towards DevSecOps is more self-sustaining rather than imposed from outside. This is the priority over speed of delivery. In practical terms, this means that it is vital to offer both pairing (to elevate knowledge) and support (when things go wrong) from the start to maintain goodwill towards the effort. The aim is to spread and elevate the knowledge as far across the department as possible. 

Keep in mind, though, that knowledge transfer will likely slow down implementation. This means that it is key to ensure you regularly report to stakeholders on progress regarding both policy deployment and policy outputs, as this will help sustain the momentum for the effort.

Key points:

  • Report on progress as you go
  • Provide (and focus on) help and support for the people who will maintain this in future
  • Where you can, prioritise spreading knowledge far and wide over delivering quickly

Once you have reached baseline adoption, you should be at a ‘point of no return’ which allows you to push on to move to your ideal target state.

  1. Evolve to pure DevSecOps

Now that you have brought the parties on-side and demonstrated progress, you can start to move towards your ideal state. This begs the question of what that ideal state is, but we’re not going to exhaustively cover that here as that’s not the focus. Suffice it to say that security needs to be baked into every step of the overall development life cycle and owned by the development and operations teams as much as it is by the security team.

Some of the areas you would want to work on from here include:

  • Introducing/cementing separation of duties
  • Setting up tests on the various compliance tools used in the SDLC
  • Approval automation
  • Automation of policy tests’ efficacy and correctness
  • Compliance as code

These areas, if tackled too early, can bloat your effort to the point where the business sees it as too difficult or expensive to achieve. This is why it’s important to tackle the areas that maximise the likelihood of adoption of tooling and principles in the early stages.

Once all these things are coming together, you will naturally start to turn to the organisational changes necessary to get you to a ‘pure DevSecOps’ position, where development teams and security teams are working together seamlessly.

Conclusion

Like all formulas for business and technological change, this three-phase approach to introducing DevSecOps can’t be applied in exactly the same way in every situation. However, we’ve found in practice that the basic shape of the approach is very likely to be a successful one, assuming the necessary prerequisites are in place.

Building DevSecOps adoption in your business is not just about speed of delivery, it’s about making steady progress whilst setting your organisation up for success. To do this you need to make sure you are building capabilities and not just code.


This article was originally published on Container Solutions’ blog and is reproduced here by permission.


If you like this, you might like one of my books:
Learn Bash the Hard Way

Learn Git the Hard Way
Learn Terraform the Hard Way

LearnGitBashandTerraformtheHardWay
Buy in a bundle here

If you enjoyed this, then please consider buying me a coffee to encourage me to do more.


A Little Shell Rabbit Hole

Occasionally I run dumb stuff in the terminal. Sometimes something unexpected happens and it leads me to wonder ‘how the hell did that work?’

This article is about one of those times and how looking into something like that taught me a few new things about shells. After decades using shells, they still force me to think!

The tl;dr is at the end if you don’t want to join me down this rabbit hole…

The Dumb Thing I Ran

The dumb thing I occasionally ran was:

grep .* *

If you’re experienced in the shell you’ll immediately know why this is dumb. For everyone else, here are some reasons:

  • The first argument to grep should always be a quoted string – without them, the shell treats the .* as a glob, not a regexp
  • grep .* just matches every line, so…
  • you could just get almost the same output by running cat *

Not Quite So Dumb

Actually, it’s not quite as dumb as I’ve made out. Let me explain.

In the bash shell, ‘.*‘ (unquoted) is a glob matching all the files beginning with the dot character. So the ‘grep .* *‘ command above interpreted in this (example) context:

$ ls -a1
.    ..    .adotfile    file1   file2

Would be interpreted as the command in bold below:

$ echo grep .* *
grep . .. .adotfile file1 file2

The .* gets expanded by the shell as a glob to all file or folders beginning with the literal dot character.

Now, remember, every folder contains at least two folders:

  • The dot folder (.), which represents itself.
  • The double-dot folder (..), which represents the parent folder

So these get added to the command:

grep . ..

Followed by any other file or folder beginning with a dot. In the example above, that’s .adotfile.

grep . .. .adotfile

And finally, the ‘*‘ at the end of the line expands to all of the files in the folder that don’t begin with a dot, resulting in:

grep . .. .adotfile file1 file2

So, the regular expression that grep takes becomes simply the dot character (which matches any line with a single character in it), and the files it searches are the remaining items in the file list:

..
.adotfile
file1
file2

Since one of those is a folder (..), grep complains that:

grep: ..: Is a directory

before going on to match any lines with any characters in. The end result is that empty lines are ignored, but every other line is printed on the terminal.

Another reason why the command isn’t so dumb (and another way it differs from ‘cat *‘) is that since multiple files are passed into grep, it reports on the filename, meaning the output automatically adds which file the line comes from.

bash-5.1$ grep .* *
grep: ..: Is a directory
.adotfile:content in a dotfile
file1:a line in file1
file2:a line in file2

Strangely, for two decades I hadn’t noticed that this is a very roundabout and wrong-headed (ie dumb) way to go about things, nor had I thought about its output being different from what I might have expected; it just never came up. Running ‘grep .* *‘ was probably a bad habit I picked up when I was a shell newbie last century, and since then I never needed to think about why I did it, or even what it did until…

Why It Made Me Think

The reason I had to think about it was that I started to use zsh as my default terminal on my Mac. Let’s look at the difference with some commands you can try:

bash-5.1$ mkdir rh && cd rh
bash-5.1$ cat > afile << EOF
text
EOF
bash-5.1$ bash
bash-5.1$ grep .* afile
grep: ..: Is a directory
afile:text
bash-5.1$ zsh 
zsh$ grep .* afile
zsh:1: no matches found: .*

For years I’d been happily using grep .* but suddenly it was telling me there were no matches. After scratching my head for a short while, I realised that of course I should have quotes around the regexp, as described above.

But I was still left with a question: why did it work in bash, and not zsh?

Google It?

I wasn’t sure where to start, so I googled it. But what to search for? I tried various combinations of ‘grep in bash vs zsh‘, ‘grep without quotes bash zsh‘, and so on. While there was some discussion of the differences between bash and zsh, there was nothing which addressed the challenge directly.

Options?

Since google wasn’t helping me, I looked for shell options that might be relevant. Maybe bash or zsh had a default option that made them behave differently from one another?

In bash, a quick look at the options did not reveal many promising candidates, except for maybe noglob:

bash-5.1$ set -o | grep glob
noglob off
bash-5.1$ set -o noglob
bash-5.1$ set -o | grep glob
noglob on
bash-5.1$ grep .* *
grep: *: No such file or directory

But this is different from zsh‘s output. What noglob does is completely prevent the shell from expanding globs. This means that no file matches the last ‘*‘ character, which means that grep complains that no files are matched at all, since there is no file named ‘*‘ in this folder.

And for zsh? Well, it turns out there are a lot of options in zsh…

zsh% set -o | wc -l
185

Even just limiting to those options with glob in them doesn’t immediately hit a jackpot:

zsh% set -o | grep glob
nobareglobqual        off
nocaseglob            off
cshnullglob           off
extendedglob          off
noglob                off
noglobalexport        off
noglobalrcs           off
globassign            off
globcomplete          off
globdots              off
globstarshort         off
globsubst             off
kshglob               off
nullglob              off
numericglobsort       off
shglob                off
warncreateglobal      off

While noglob does the same as in bash, after some research I found that the remainder are not relevant to this question.

(Trying to find this out, though, it tricky. First zsh‘s man page is not complete like bash‘s, it’s divided into multiple man pages. Second, concatenating all the zsh man pages with man zshall and searching for noglob gest no matches. It turns out that options are documented in caps with underscored separating words. So, in noglob‘s case, you have to search for NO_GLOB. Annoying.)

zsh with xtrace?

Next I wondered whether this was due to some kind of startup problem with my zsh setup, so I tried starting up zsh with the xtrace option to see what’s run on startup. But the output was overwhelming, with over 13,000 lines pushed to the terminal:

bash-5.1$ zsh -x 2> out
zsh$ exit
bash-5.1$ wc -l out
13328

I did look anyway, but nothing looked suspicious.

zsh with NO_RCS?

Back to the documentation, and I found a way to start zsh without any startup files by starting with the NO_RCS option.

bash-5.1$ zsh -o NO_RCS
zsh$ grep .* afile
zsh:1: no matches found: .*

There was no change in behaviour, so it wasn’t anything funky I was doing in the startup.

At this point I tried using the xtrace option, but then re-ran it in a different folder by accident:

zsh$ set -o xtrace
zsh$ grep .* *
zsh: no matches found: .*
zsh$ cd ~/somewhere/else
zsh$ grep .* *
+zsh:3> grep .created_date notes.asciidoc

Interesting! The original folder I created to test the grep just threw an error (no matches found), but when there is a dotfile in the folder, it actually runs something… and what it runs does not include the dot folder (.) or parent folder (..)

Instead, the ‘grep .* *‘ command expands the ‘.*‘ into all the files that begin with a dot character. For this folder, that is one file (.created_date), in contrast to bash, where it is three (. .. .created_date). So… back to the man pages…

tl;dr

After another delve into the man page, I found the relevant section in man zshall that gave me my answer:

FILENAME GENERATION

[...]

In filename generation, the character /' must be matched explicitly; also, a '.' must be matched explicitly at the beginning of a pattern or after a '/', unless the GLOB_DOTS option is set. No filename generation pattern matches the files '.' or '..'. In other instances of pattern matching, the '/' and '.' are not treated specially.

So, it was as simple as: zsh ignores the ‘.‘ and ‘..‘ files.

But Why?

But I still don’t know why it does that. I assume it’s because the zsh designers felt that that wrinkle was annoying, and wanted to ignore those two folders completely. It’s interesting that there does not seem to be an option to change this behaviour in zsh.

Does anyone know?


If you like this, you might like one of my books:
Learn Bash the Hard Way

Learn Git the Hard Way
Learn Terraform the Hard Way

LearnGitBashandTerraformtheHardWay
Buy in a bundle here

If you enjoyed this, then please consider buying me a coffee to encourage me to do more.

“Who Should Write the Terraform?”

The Problem

Working in Cloud Native consulting, I’m often asked about who should do various bits of ‘the platform work‘.

I’m asked this in various forms, and at various levels, but the title’s question (‘Who should write the Terraform?) is a fairly typical one. Consultants are often asked simple questions that invite simple answers, but it’s our job to frustrate our clients, so I invariably say “it depends”.

The reason it depends is that the answers to these seemingly simple questions are very context-dependent. Even if there is an ‘ideal’ answer, the world is not ideal, and the best thing for a client at that time might not be the best thing for the industry in general.

So here I attempt to lay out the factors that help me answer that questions as honestly as possible. But before that, we need to lay out some background.

Here’s an overview of the flow of the piece:

  • What is a platform?
  • How we got here
    • Coders and Sysadmins became…
    • Dev and Ops, but silos and slow time to market, so…
    • DevOps, but not practical, so…
    • SRE and Platforms
  • The factors that matter
    • Non-negotiable standards
    • Developer capability
    • Management capability
    • Platform capability
    • Time to market

What is a Platform?

Those old enough to remember when the word ‘middleware’ was everywhere will know that many industry terms are so vague or generic as to be meaningless. However, for ‘platform’ work we have a handy definition, courtesy of Team Topologies:

The purpose of a platform team is to enable stream-aligned teams to deliver
work with substantial autonomy. The stream-aligned team maintains full
ownership of building, running, and fixing their application in production.
The platform team provides internal services to reduce the cognitive load
that would be required from stream-aligned teams to develop these
underlying services.

Team Topologies, Matthew Skelton and Manuel Pais

A platform team, therefore, (and putting it crudely) builds the stuff that lets others build and run their stuff.

So… is the Terraform written centrally, or by the stream-aligned teams?

To explain how I would answer that, I’m going to have to do a little history.

How We Got Here

Coders and Sysadmins

In simpler times – after the Unix epoch and before the dotcom boom – there were coders and there were sysadmins. These two groups speciated from the generic ‘computer person’ that companies found they had to have on the payroll (whether they liked it or not) in the 1970s and 80s.

As a rule, the coders liked to code and make computers do new stuff, and the sysadmins liked to make sure said computers worked smoothly. Coders would eagerly explain that with some easily acquired new kit, they could revolutionise things for the business, while sysadmins would roll their eyes and ask how this would affect user management, or interoperability, or stability, or account management, or some other boring subject no-one wanted to hear about anymore.

I mention this because this pattern has not changed. Not one bit. Let’s move on.

Dev and Ops

Time passed, and the Internet took over the world. Now we had businesses running websites as well as their internal machines and internal networks. Those websites were initially given to the sysadmins to run. Over time, these websites became more and more important for the bottom line, so eventually, the sysadmins either remained sysadmins and looked after ‘IT’, or became ‘operations’ (Ops) staff and looked after the public-facing software systems.

Capable sysadmins had always liked writing scripts to automate manual tasks (hence the t-shirt), and this tendency continued (sometimes) in Ops, with automation becoming the defining characteristic of modern Ops.

Eventually a rich infrastructure emerged around the work. ‘Pipelines’ started to replace ‘release scripts’, and concepts like ‘continuous integration’, and ‘package management’ arose. But we’re jumping ahead a bit; this came in the DevOps era.

Coders, meanwhile, spent less and less time doing clever things with chip registers and more and more time wrangling different software systems and APIs to do their business’s bidding. They stopped being called ‘coders’ and started being called ‘developers’.

So ‘Devs’ dev’d, and ‘Ops’ ops’ed.

These groups grew in size and proportion of the payroll as software started to ‘eat the world’.

In reality, of course, there was a lot of overlap between the two groups, and people would often move from one side of the fence to the other. But the distinction remained, and become organisational orthodoxy.

Dev and Ops Inefficiencies

As the Dev and Ops this pattern became bedded into organisation, people noted some inefficiencies with this state of affairs:

  • Release overhead
  • Misplaced expertise
  • Cost

First, there was a release overhead as Dev teams passed changes to Ops. Ops teams typically required instructions for how to do releases, and in a pre-automation age these were often prone to error without app- or even release-specific knowledge. I was present about 15 years in a very fractious argument between a software supplier and its client’s Ops team after an outage. The Ops team attempted to follow instructions for a release, which resulted in an outage, as instructions were not followed correctly. There was much swearing as the Ops team remonstrated that the instructions were not clear enough, while the Devs argued that if the instructions had been followed properly then it would have worked. Fun.

Second, Ops teams didn’t know in detail what they were releasing, so couldn’t fix things if they went wrong. The best they could do was restart things and hope they worked.

Third, Ops teams looked expensive to management. They didn’t deliver ‘new value’, just farmed existing value, and appeared slow to respond and risk-averse.

I mention this because this pattern has not changed. Not one bit. Let’s move on.

These and other inefficiencies were characterised as ‘silos’ – unhelpful and wasteful separations of teams for (apparently) no good purpose. Frictions increased as these mismatches were exacerbated by embedded organisational separation.

The solution was clearly to get rid of the separation: no more silos!

Enter DevOps

The ‘no more silos’ battle cry got a catchy name – DevOps. The phrase was usefully vague and argued over for years, just as Agile was and is (see here). DevOps is defined by Wikipedia as ‘a set of practices that combines software development (Dev) and IT operations (Ops)’.

At the purest extreme, DevOps is the movement of all infrastructure and operational work and responsibilities (ie ‘delivery dependencies’) into the development team.

This sounded great in theory. It would:

  • Place the operational knowledge within the development team, where its members could more efficiently collaborate in tighter iterations
  • Deliver faster – no more waiting weeks for the Ops team to schedule a release, or waiting for Ops to provide some key functionality to the development team
  • Bring the costs of operations closer to the value (more exactly: the development team bore the cost of infrastructure and operations as part of the value stream), making P&L decisions closer to the ‘truth’

DevOps Didn’t

But despite a lot of effort, the vast majority of organisations couldn’t make this ideal work in practice, even if they tried. The reasons for this were systemic, and some of the reasons are listed below:

  • Absent an existential threat, the necessary organisational changes were more difficult to make. This constraint limited the willingness or capability to make any of the other necessary changes
  • The organisational roots of the Ops team were too deep. You couldn’t uproot the metaphorical tree of Ops without disrupting the business in all sorts of ways
  • There were regulatory reasons to centralise Ops work which made distribution very costly
  • The development team didn’t want to – or couldn’t – do the Ops work
  • It was more expensive. Since some work would necessarily be duplicated, you couldn’t simply distribute the existing Ops team members across the development teams, you’d have to hire more staff in, increasing cost

I said ‘the vast majority’ of organisations couldn’t move to DevOps, but there are exceptions. The exceptions I’ve seen in the wild implemented a purer form of DevOps when there existed:

  • Strong engineering cultures where teams full of T-shaped engineers want to take control of all aspects of delivery AND
  • No requirement for centralised control (eg regulatory/security constraints)

and/or,

  • A gradual (perhaps guided) evolution over time towards the breaking up of services and distribution of responsibility

and/or,

  • Strong management support and drive to enable

The most famous example of the ‘strong management support’ is Amazon, where so-called pizza teams must deliver and support their products independently. (I’ve never worked for Amazon so I have no direct experience of the reality of this). This, notably, was the product of a management edict to ensure teams operated independently.

When I think of this DevOps ideal, I think of a company with multiple teams each independently maintaining their own discrete marketing websites in the cloud. Not many businesses have that kind of context and topology.

Enter SRE and Platforms

One of the reasons listed above for the failure of DevOps was the critical one: expense.

Centralisation, for all its bureaucratic and slow-moving faults, can result in vastly cheaper and more scalable delivery across the business. Any dollar spent at the centre can save n dollars across your teams, where n is the number of teams consuming the platform.

The most notable example of this approach is Google, who have a few workloads to run, and built their own platform to run them on. Kubernetes is a descendant of that internal platform.

It’s no coincidence that Google came up with DevOps’s fraternal concept: SRE. SRE emphasised the importance of getting Dev skills into Ops rather than making Dev and Ops a single organisational unit. This worked well at Google primarily because there was an engineering culture at the centre of the business, and an ability to understand the value of investing in the centre rather than chasing features. Banks (who might well benefit from a similar way of thinking) are dreadful at managing and investing in centralised platforms, because they are not fundamentally tech companies (they are defenders of banking monopoly licences, but that’s a post for another day, also see here).

So across the industry, those that might have been branded sysadmins first rebranded themselves as Ops, then as DevOps, and finally SREs. Meanwhile they’re mostly the same people doing similar work.

Why the History Lesson?

What’s the point of this long historical digression?

Well, it’s to explain that, with a few exceptions, the division between Dev and Ops, and between centralisation and distribution of responsibility has never been resolved. And the reasons why the industry seems to see-saw are the same reasons why the answer to the original question is never simple.

Right now, thanks to the SRE movement (and Kubernetes, which is a trojan horse leading you away from cloud lock-in), there is a fashion-swing back to centralisation. But that might change again in a few years.

And it’s in this historical milieu that I get asked questions about who should be responsible for what with respect to work that could be centralised.

The Factors

Here are the factors that play into the advice that I might give to these questions, in rough order of importance.

Factor One: Non-Negotiable Standards

If you have standards or practices that must be enforced on teams for legal, regulatory, or business reasons, then at least some work needs to be done at the centre.

Examples of this include:

  • Demonstrable separation of duties between Dev and Ops
  • User management and role-based access controls

Performing an audit on one team is obviously significantly cheaper than auditing a hundred teams. Further, with an audit, the majority of expense is not in the audit but the follow-on rework. The cost of that can be reduced significantly if a team is experienced at knowing from the start what’s required to get through an audit. For these reasons, the cost of an audit across your 100 dev teams can be more than 100x the cost of a single audit at the centre.

Factor Two: Engineer Capability

Development teams vary significantly in their willingness to take on work and responsibilities outside their existing domain of expertise. This can have a significant effect on who does what.

Anecdote: I once worked for a business that had a centralised DBA team, who managed databases for thousands of teams. There were endless complaints about the time taken to get ‘central IT’ to do their bidding, and frequent demands for more autonomy and freedom.

A cloud project was initiated by the centralised DBA team to enable that autonomy. It was explained that since the teams could now provision their own database instances in response to their demands, they would no longer have a central DBA team to call on.

Cue howls of despair from the development teams that they need a centralised DBA service, as they didn’t want to take this responsibility on, as they don’t have the skills.

Another example is embedded in the title question about Terraform. Development teams often don’t want to learn the skills needed for a change of delivery approach. They just want to carry on writing in whatever language they were hired to write in.

This is where organisational structures like ‘cloud native centres of excellence’ (who just ‘advise’ on how to use new technologies), or ‘federated devops teams’ (where engineers are seconded to teams to spread knowledge and experience) come from. The idea with these ‘enabling teams’ is that once their job is done they are disbanded. Anyone who knows anything about political or organisational history knows that these plans to self-destruct often don’t pan out that way, and you’re either stuck with them forever, or some put-upon central team gets given responsibility for the code in perpetuity.

Factor Three: Management Capability

While the economic benefits of having a centralised team doing shared work may seem intuitively obvious, senior management in various businesses are often not able to understand its value, and manage it as a pure cost centre.

This is arguably due to assumptions arising out of internal accounting assumptions. Put simply, the value gained from centralised work is not traced back to profit calculations, so is seen as pure cost. (I wrote a little about non-obvious business value here.)

In companies with competent technical management, the value gained from centralised work is (implicitly, due to an understanding of the actual work involved) seen as valuable. This is why tech firms such as Google can successfully manage a large-scale platform, and why it gave birth to SRE and Kubernetes, two icons of tech org centralisation. It’s interesting that Amazon – with its roots in retail, distribution, and logistics – takes a radically different distributed approach.

If your organisation is not capable of managing centralised platform work, then it may well be more effective to distribute the work across the feature teams, so that cost and value can be more easily measured and compared.

Factor Four: Platform Team Capability

Here we are back to the old fashioned silo problem. One of the most common complaints about centralised teams is that they fail to deliver what teams actually need, or do so in a way that they can’t easily consume.

Often this is because of the ‘non-negotiable standards’ factor above resulting in security controls that stifle innovation. But it can also be because the platform team is not interested, incentivised, or capable enough to deliver what the teams need. In these latter cases, it can be very inefficient or even harmful to get them to do some of the platform work.

This factor can be mitigated with good management. I’ve seen great benefits from moving people around the business so they can see the constraints other people work under (a common principle in the DevOps movement) rather than just complain about their work. However, as we’ve seen, poor management is often already a problem, so this can be a non-starter.

Factor Five: Time to Market

Another significant factor is whether it’s important to keep the time to delivery low. Retail banks don’t care about time to delivery. They may say they do, but the reality is that they care far more about not risking their banking licence, not causing outages that attract the interest of regulators. In the financial sector, hedge funds, by contrast, might care very much about time to market as they are unregulated and wish to take advantage of any edge they might have as quickly as possible. Retail banks tend towards centralised organisational architectures, while hedge funds devolve responsibility as close to the feature teams as possible.

So, Who Should Write the Terraform?

Returning to the original question, the question of ‘who should write the Terraform?’ can now be more easily answered, or at least approached. Depending on the factors discussed above, it might make sense for them to be either centralised or distributed.

More importantly, by not simply assuming that there is a ‘right’ answer, you can make decisions about where the work goes with your eyes open about what the risks, trade-offs, and systemic preferences of your business are.

Whichever way you go, make sure that you establish which entity will be responsible for maintaining the resulting code as well as producing it. Code, it is important to remember, is an asset that needs maintenance to remain useful and if this is ignored there could be great confusion in the future.


If you like this, you might like one of my books:
Learn Bash the Hard Way

Learn Git the Hard Way
Learn Terraform the Hard Way

LearnGitBashandTerraformtheHardWay
Buy in a bundle here

If you enjoyed this, then please consider buying me a coffee to encourage me to do more.

Business Value, Soccer Canteens, Engineer Retention, and the Bricklayer Fallacy

Having the privilege of working in software in the 2020s, I hear variations on the following ideas expressed frequently:

  • ‘There must be some direct relationship between your work and customer value!’
  • ‘The results of your actions must be measurable!’

These ideas manifest in statements like this, which sound very sensible and plausible:

  • ‘This does not benefit the customer. This is not a feature to the customer. So we should not do it.’
  • ‘We are not in the business of doing X, so should not focus on it. We are in the business of serving the customer’
  • ‘This does not improve any of the key metrics we identified’

I want to challenge these ideas. In fact, I want to turn them on their head:

  • Many peoples’ work generate value by focussing on things that appear to have no measurable or apparently justifiable customer benefit.
  • Moreover, judgements on these matters are what people are (and should be) paid to exercise.

Alex Ferguson and Canteen Design

To encapsulate these ideas I want to use an anecdote from the sporting world, that unforgiving laboratory of success and failure. In that field, the record of Alex Ferguson, manager of Manchester United (a UK football, or soccer team) in one of their ‘golden eras’ from 1986 to 2013, is unimpeachable. During those 27 years, he took them from second-from-bottom in the UK premier league table in 1986 to treble trophy winners in Europe in 1998-1999.

Fortunately, he’s recorded his recollections and lessons in various books, and these books provide a great insight into how such a leader thinks, and what they’re paid to do.

Alex Ferguson demonstrating how elite-level sports teams can be coached to success

Now, to outsiders, the ‘business value’ he should be working towards is obvious. Some kind of variation of ‘make a profit’, or ‘win trophies’, or ‘score more goals than you concede in every match’ is the formulation most of us would come up with. Obviously, these goals break down to sub-goals like:

  • Buy players cheaply and extract more value from them than you paid for
  • Optimise your tactics for your opponents
  • Make players work hard to maintain fitness and skills

Again, we mortals could guess these. What’s really fascinating about Ferguson’s memoirs is the other things he focusses on, which are less obvious to those of us that are not experts in elite-level soccer.

Sometimes if I saw a young player, a lad in the academy, eating by himself, I would go and sit beside him. You have to make everyone feel at home. That doesn’t mean you’re going to be soft on them–but you want them to feel that they belong. I’d been influenced by what I had learned from Marks & Spencer, which, decades ago in harder times, had given their staff free lunches because so many of them were skipping lunch so they could save every penny to help their families. It probably seems a strange thing for a manager to be getting involved in–the layout of a canteen at a new training ground–but when I think about the tone it set within the club and the way it encouraged the staff and players to interact, I can’t overstate the importance of this tiny change.

Alex Ferguson, Leading

Now, I invite you to imagine a product owner, or scrum master for Manchester United going over this ‘update’ with him:

  • How does spending your time with junior players help us score more goals on Saturday?
  • Are we in the business of canteen architecture or soccer matches?
  • How do you measure the benefit of these peripheral activities?
  • Why are you micromanaging building design when we have paid professionals hired in to do that?
  • How many story points per sprint should we allocate to your junior 1-1s and architectural oversight?

It is easy to imagine how that conversation would have gone, especially given Ferguson’s reputation for robust plain speaking. (The linked article is also a mini-goldmine of elite talent management subtleties hiding behind a seemingly brutish exterior.)

Software and Decision Horizons

It might seem like managing a soccer team and working in software engineering are worlds apart, but there’s significant commonality.

Firstly, let’s look at the difference of horizon between our imagined sporting scrum master and Alex Ferguson.

The scrum master is thinking in:

  • Very short time periods (weeks or months)
  • Specific and measurable goals (score more goals!)

Alex Ferguson, by contrast, is thinking in decades-long horizons, and (practically) unmeasurable goals:

  • If I talk to this player briefly now, they may be motivated to work for us for the rest of their career
  • I may encourage others to help their peers by being seen to inculcate a culture of mutual support

I can think of a specific example of such a clash of horizons that resulted in a questionable decision in a software business.

Twenty years ago I worked for a company that had an ‘internal wiki’ – a new thing then. Many readers of this piece will know of the phenomenon of ‘wiki-entropy’ (I just made that word up, but I’m going to use it all the time now) whereby an internal documentation system gradually degrades to useless whatever the value of some of the content on there due to it getting overwhelmed by un-maintained information.

Well, twenty years ago we didn’t have that problem. We decided to hire a young graduate with academic tendencies to maintain our wiki. He assiduously ranged across the company, asking owners of pages whether the contents were still up to date, whether information was duplicated, complete, no longer needed, and so on.

The result of this was a wiki that was extremely useful, up to date, where content was easily found and minimal time was wasted getting information. The engineers loved it, and went out of their way to praise his efforts to save them from their own bad habits.

Of course, the wiki curator was first to be let go when the next opportunity arose. While everyone on the ground knew of the high value of this in saving lost time and energy chasing around bad information across hundreds of engineers, the impact was difficult or never measured, and in any case, shouldn’t the engineers be doing that themselves?

For years afterwards, whenever we engineers were frustrated with the wiki, we always cursed whoever it was that made the short-sighted decision to let his position go.

So-called ‘business people’, such as shareholders, executives, project managers, and product owners are strongly incentivised to deliver short term, which most often meant prioritise short-term goals (‘mission accomplished’) over longer-term value. Those that don’t think short-term often have a strong background in engineering and have succeeded in retaining their position despite this handicap.

What To Do? Plan A – The Scrum Courtroom

So your superiors don’t often think long term about the work you are assigned, but you take pride in what you do, and want the value of your work to be felt over a longer time than just a sprint or a project increment. And you don’t want people cursing your name as they suffer from your short-term self-serving engineering choices.

Fortunately, a solution has arisen that should handle this difference of horizon: scrum. This methodology (in theory, but that’s a whole other story) strictly defines project work to be done within a regular cadence (eg two weeks). At the start of this cadence (the sprint), the team decides together what items should go in it.

At the beginning of each cadence, therefore, you get a chance to argue the case for the improvement, or investment you want to make in the system you are working on being included in the work cadence.

The problem with this is that these arguments mostly fail because the cards are still stacked against you, in the following ways:

  • The cadence limit
  • Uncertainty of benefit
  • Uncertainty of completion
  • Uncertainty of value

Plan A Mitigators – The Cadence Limit

First, the short-term nature of the scrum cadence has an in-built prejudice against larger-scale and more speculative/innovative ideas. If you can’t get your work done within the cadence, then it’s more easily seen as impractical or of little value.

The usual counter to this is that the work should be ‘broken down’ in advance to smaller chunks that can be completed within the sprint. This often has the effect of making the work seem either profoundly insignificant (‘talk to a young player in the canteen’), and of losing sight of the overall picture of the work being proposed (‘change/maintain the culture of the organisation’).

Plan A Mitigators – Uncertainty of Benefit

The scrum approach tries to increment ‘business value’ in each sprint. Since larger-scale and speculative/innovative work is generally riskier, it’s much harder to ‘prove’ the benefit for the work you do in advance, especially within the sprint cadence.

The result is that such riskier work is less likely to be sanctioned by the scrum ‘court’.

Plan A Mitigators – Uncertainty of Completion

Similarly, there is an uncertainty as to whether the work will get completed within the sprint cadence. Again, this makes the chances of success arguing your case less likely.

Plan A Mitigators – Uncertainty of Value

‘Business Value’ is a very slippery concept the closer you look at it. Mark Schwartz wrote a book I tell everyone to read deconstructing the very term, and showing how no-one really knows what it means. Or, at the very least, it means very different things to different people.

The fact is that almost anything can be justified in terms of business value:

  • Spending a week on an AWS course
    • As an architect, I need to ensure I don’t make bad decisions that will reduce the flow of features for the product
  • Spending a week optimising my dotfiles
    • As a developer, I need to ensure I spend as much time coding efficiently as possible so I can produce more features for the product
  • Tidying up the office
    • As a developer, I want the office to be tidier so I can focus more effectively on writing features for the product
  • Hiring a Michelin starred chef to make lunch
    • As a developer, I need my attention and nutrition to be optimised so I can write more features for the product without being distracted by having to get lunch

The problem with all these things is that they are effectively impossible to measure.

There’s generally no objective way to prove customer value (even if we can be sure what it is). Some arguments just sound rhetorically better to some ears than others.

If you try and justify them within some of business framework (such as improving a defined and pre-approved metric), you get bogged down in discussions that you effectively can’t win.

  • How long will this take you?
    • “I don’t know, I’ve never done this before”
  • What is the metric?
    • “Um, culture points? Can we measure how long we spend scouring the wiki and then chasing up information gleaned from it? [No, it’s too expensive to do that]”

‘Plan A’ Mitigators Do Have Value

All this is not to say that these mitigators should be removed, or have no purpose. Engineers, as any engineer knows, can have a tendency to underestimate how hard something will be to build, how much value it will bring, and even do ‘CV-driven development’ rather than serve the needs of the business.

The same could be said of soccer managers. But we still let soccer managers decide how to spend their time, and more so the more the more experienced they are, and the more success they have demonstrated.

But…

In any case, I have been involved in discussions like this at numerous organisations that end up taking longer than actually doing the work, or at least doing the work necessary to prove the value in proof of concept form.

So I mostly move to Plan B…

What To Do? Plan B – Skunkworks It

Plan B is to skip court and just do the work necessary to be able to convince others that yours is the way to go without telling anyone else. This is broadly known as ‘skunkworks‘.

The first obvious objection to this approach is something like this:

‘How can this be done? Surely the the time taken for work in your sprint has been tightly defined and estimated, and you therefore have no spare time?’

Fortunately this is easily solved. The thing about leaders who don’t have strong domain knowledge is that their ignorance is easily manipulated by those they lead. So the workers simply bloat their estimates, telling them that the easy official tasks they have will take longer than they actually will take, leaving time left over for them to work on the things they think are really important or valuable to the business.

Yes, that’s right: engineers actually spend time they could be spending doing nothing trying to improve things for their business in secret, simply because they want to do the right thing in the best way. Just like Alex Ferguson spent time chatting to juniors, and micromanaging the design of a canteen when he could have enjoyed a longer lunch alone, or with a friend.

Yes, that’s right: engineers actually spend time they could be spending doing nothing trying to improve things for their business in secret, simply because they want to do the right thing in the best way.

It’s Not A Secret

Good leaders know this happens, even encourage it explicitly. A C-level leader (himself a former engineer) once said to me “I love that you hide things from me. I’m not forced to justify to my peers why you’re spending time on improvements if I don’t know about them and just get presented with a solution for free.”

The Argument

When you get paid to make decisions, you are being paid to exercise your judgement exactly in the ways that can’t be justified within easily measurable and well-defined metrics of value.

If your judgement could be quantified and systematised, then there would be no need for you to be there to make those judgements. You’d automate it.

This is true whether you are managing a soccer team, or doing software engineering.

Making software is all about making classes of decision that are the same in shape as Alex Ferguson’s. Should I:

  • Fix or delete the test?
  • Restructure the pipeline because its foundations are wobbly, or just patch it for now?
  • Mentor a junior to complete this work over a few days, or finish the job myself in a couple of hours?
  • Rewrite this bash script in Python, or just add more to it?
  • Spend the time to containerise the application and persuade everyone else to start using Docker, or just muddle along with hand-curated environments as we’ve always done?
  • Spend time getting to know the new joiner on the team, or focus on getting more tickets in the sprint done?

Each of these decisions has many consequences which are unclear and unpredictable. In the end, someone needs to make a decision where to spend the time based on experience as the standard metrics can’t tell you whether they’re a good idea.

Conclusion

At the heart of this problem in software is what I call the ‘bricklayer fallacy’. Many view software engineering as a series of tasks analogous to laying bricks: once you are set up, you can say roughly how long it will take to do something, because laying one brick takes a predictable amount of time.

This fallacy results in the treatment of software engineering as readily convertible to what business leaders want: a predictable graph of delivery over time. Attempts to maintain this illusion for business leaders results in the fairy stories of story points, velocity, and burn-down charts. All of these can drive the real value work underground.

If you want evidence of this not working, look here. Scrum is conspicuously absent as a software methodology at the biggest tech companies. They don’t think of their engineers as bricklayers.

Soccer managers don’t suffer as much from this fallacy because we intuitively understand that building a great soccer team is not like building a brick wall.

But software engineering is also a mysterious and varied art. It’s so full of craft and subtle choices that the satisfaction of doing the job well exceeds the raw compensation for attendance and following the rules. Frequently, I’ve observed that ‘working to rule’ gets the same pay and rewards as ‘pushing to do the right thing for the long term’, but results in real human misery. At a base level, your efforts and their consequences are often not even noticed by anyone.

If you remove this judgement from people, you remove their agency.

This is a strange novelty of knowledge work that didn’t exist in the ‘bricklayer’, or age of piece-work and Taylorism. In the knowledge work era, the engineers who like to actually deliver the work of true long term value get dissatisfied and quit. And paying them more to put with it doesn’t necessarily help, as the ones that stay are the ones that have learned to optimise for getting more money rather than better work. These are exactly the people you don’t want doing the work.

If you want to keep the best and most innovative staff – the ones that will come up with 10x improvements to your workflows that result in significant efficiencies, improvements, and savings – you need to figure out who the Alex Fergusons are, and give them the right level autonomy to deliver for you. That’s your management challenge.


If you enjoyed this, then please consider buying me a coffee to encourage me to do more.

Five Reasons To Master Git On The Command Line

If you spend most of your days in a browser watching pipelines and managing pull requests, you may wonder why anyone would prefer the command line to manage their Git workflow.

I’m here to persuade you that using the command line is better for you. It’s not easier in every case you will find, and it can be harder to learn. But investing the time in building those muscles will reap serious dividends for as long as you use Git.

Here are five (far from exhaustive) reasons why you should become one with Git on the command line.

1. git log is awesome

There are so many ways that git log is awesome I can’t list them all here. I’ve also written about it before.

If you’ve only looked at Git histories through GitHub or BitBucket then it’s unlikely you’ve seen the powerful views of what’s going on with your Git repository.

This is the capsule command that covers most of the flags I use on the regular:

git log --oneline --all --decorate --graph

--oneline – shows a summary per commit in one line, which is essential for seeing what’s going on

--graph – arranges the output into a graph, showing branches and merges. The format can take some time to get used to, especially for complex repositories, but it doesn’t take long for you to get used to

--all – shows all the available branches stored locally

--decorate – shows any reference names

This is what kubernetes looks like when I check it out and run that command:

Most versions of git these days implicitly use --decorate so you won’t necessarily need to remember that flag.

Other arguments that I regularly use with git log include:

--patch – show the changes associated with a commit

--stat – summarise the changes made by file

--simplify-by-decoration – only shows changes that have a reference associated with them. This is particularly useful if you don’t want to see all commits, just ‘significant’ ones associated with branches or tags.

In addition, you have a level of control that the GitBucketLab tools lack when viewing histories. By setting the pager in your ~/.gitconfig file, you can control how the --patch output looks. I like the diff-so-fancy tool. Here’s my config:

[core]
        pager = diff-so-fancy | less -RF

The -R argument to less above shows control characters, and -F quits if the output fits in one screen.


If you like this post, you may like my book Learn Git the Hard Way

learngitthehardway

2. The git add flags

If you’re like me you may have spent years treating additions as a monolith, running git commit -am 'message' to add and commit changes to files already tracked by Git. Sometimes this results in commits that prompt you to write a message that amounts to ‘here is a bunch of stuff I did’.

If so, you may be missing out on the power that git add can give you over what you commit.

Running:

git add -i

(or --interactive) gives you an interactive menu that allows you to choose what is to be added to Git’s staging area ready to commit.

Again, this menu takes some getting used to. If you choose a command but don’t want to do anything about it, you can hit return with no data to go back. But sometimes hitting return with no input means you choose the currently selected item (indicated with a ‘*‘). It’s not very intuitive.

Most of the time you will be adding patches. To go direct to that, you can run:

git add -p

Which takes you directly to the patches.

But the real killer command I use regularly is:

git add --edit

which allows you to use your configured editor to decide which changes get added. This is a lot easier than using the interactive menu’s ‘splitting’ and ‘staging hunks’ method.


3. git difftool is handy

If you go full command line, you will be looking at plenty of diffs. Your diff workflow will become a strong subset of your git workflow.

You can use git difftool to control how you see diffs, eg:

git difftool --tool=vimdiff

To get a list of all the available tools, run:

git difftool --tool-help

4. You can use it anywhere

If you rely on a particular GUI, then there is always the danger that that GUI will be unavailable to you at a later point. You might be working on a very locked-down server, or be forced to change OS as part of your job. Ot it may fall out of fashion and you want to try a new one.

Before I saw the light and relied on the command line, I went through many different GUIs for development, including phases of Kate, IntelliJ, Eclipse, even a brief flirtation with Visual Studio. These all have gone in and out of fashion. Git on the command line will be there for as long as Git is used. (So will vi, so will shells, and so will make, by the way).

Similarly, you might get used to a source code site that allows you to rebase with a click. But how do you know what’s really going on? Which brings me to…

It’s closer to the truth

All this leads us to the realisation that the Git command is closer to the truth than a GUI (*), and gives you more flexibility and control.

* The ‘truth’ is obviously in the source code https://github.com/git/git, but there’s also the plumbing/porcelain distinction between Git’s ‘internal’ commands and it’s ‘user-friendly’ commands. But let’s not get into that here: its standard interface can be considered the ‘truth’ for most purposes.

When you’re using git on the command line, you can quickly find out what’s going on now, what happened in the past, the difference between the remote and the local, and you won’t be gnashing your teeth in frustration because the GUI doesn’t give you exactly the information you need, or gives you a limited and opinionated view of the world.

5. You can use ‘git-extras’?

Finally, using Git on the command line means you can make use of git-extras. This commonly-available package contains a whole bunch of useful shortcuts that may help your git workflow. There are too many to list, so I’ve just chosen the ones I use most commonly.

When using many of these, it’s important to understand how it interacts with the remote repositories (if any), whether you need to configure anything to make it work, and whether it affects the history of your repository, making push or pulling potentially problematic. If you want to get a good practical understanding of these things, checkout my Learn Git The Hard Way book.

git fork

Allows you to fork a repository on GitHub. Note that for this to work you’ll need to add a personal access token to your git config under git-extras.github-personal-access-token.

git rename-tag

Renames a tag, locally and remotely. Much easier than doing all this.

git cp

Copies (rather than git mv, which renames) the file, keeping the original’s history.

git undo

Removes the latest commit. You can optionally give a number, which undoes that number of commits.

git obliterate

Uses git filter-branch to destroy all evidence of a file from your Git repo’s history.

git pr / git mr

These allow you to manage pull requests locally. git pr is for GitHub, while git mr is for GitLab.

git sync

Synchronises the history of your local branch with the remote’s version of that branch.


If you like this, you might like one of my books:
Learn Bash the Hard Way

Learn Git the Hard Way
Learn Terraform the Hard Way

LearnGitBashandTerraformtheHardWay
Buy in a bundle here

How 3D Printing Kindled A Love For Baroque Sculpture

I originally got a 3D printer in order to indulge a love of architecture. A year ago I wrote about my baby steps with my printer, and after that got busy printing off whatever buildings I could find.

After a few underwhelming prints of whatever buildings I could find, I stumbled on a site called ‘Scan the World’, which contains scans of all sorts of famous artefacts.

Pretty soon I was hooked.

Laocoön and His Sons

The first one I printed was Laocoön and His Sons, and I was even more blown away than I was when I saw it in the Vatican nearly 30 years ago.

While it’s not perfect (my print washing game isn’t quite there yet), what was striking to me was the level of detail the print managed to capture. The curls, the ripples of flesh and muscle, the fingers, they’re all there.

More than this, nothing quite beats being able to look at it close up in your own home. Previously I’d ready art history books and not been terribly impressed or interested in the sculpture I’d looked at pictures of and read about.

This encouraged me to look into the history of this piece, and it turned out to be far more interesting than I’d ever expected.

The sculptor is unknown. It was unearthed in a vineyard in 1506 in several pieces, and the Pope got wind of it, getting an architect and his mate to give it a once-over. His mate was called Michelangelo, and it soon ended up in the Vatican.

It was later nicked by the French in 1798, and returned to the Vatican 19 years later after Napoleon’s Waterloo.

It blows my mind that this sculpture was talked about by Pliny the Elder, carved by Greeks (probably), then left to the worms for a millennium, then dug up and reconstructed. I’m not even sure that historians are sure that the one in the Vatican now was the ‘original’, or whether it was itself a copy of another statue, maybe even in bronze.

Farnese Bull

The next one I printed has a similar history to Laocoön and His Sons. It was dug up during excavations from a Roman bath. Henry Peacham in ‘The Complete Gentleman’ (1634) said that it “outstrippeth all other statues in the world for greatness and workmanship”. Hard to argue with that.

La Pieta

Before Michelangelo had a look at the Laocoön, he carved this statue of the dead Jesus with his mother from a single block of marble. It’s not my favourite, but I couldn’t not print it. It’s worth it for Mary’s expression alone.

Aside from being one of the most famous sculptures ever carved, this sculpture has the distinction of being one of the few works Michelangelo ever signed, apparently because he was pissed off that people thought a rival had done it.

Apollo and Daphne

Then I moved onto the sculptor I have grown to love the most: Bernini. The master.

When I printed this one, I couldn’t stop looking at it for days.

The story behind it makes sense of the foliage: consumed by love (or lust) Apollo is chasing after Daphne, who, not wanting to be pursued because of some magical Greek-mythical reason, prays to be made ugly or for her body to change. So she becomes a tree while she runs. Apollo continues to love the tree.

Amazed by that one, I did another Bernini.

Bernini’s David

I have to call this one “Bernini’s David” to distinguish it from Michelangelo’s which seems to have the monopoly on the name in sculpture terms.

I don’t know why, though, Bernini’s David is almost as breathtaking as Apollo and Daphne. Although – like Michelangelo’s David – the figure is somewhat idealised, this David feels different: A living, breathing, fighter about to unleash his weapon rather than a beautiful boy in repose. Look at how his body is twisted, and his feet are even partially off the base.

My Own Gallery

What excites me about this 3d printing lark is the democratisation of art collection. Twenty years ago the closest I could get to these works was either by going on holiday, traipsing to central London to see copies, or looking at them in (not cheap) books at home.

Now, I can have my own art collection wherever I want, and if they get damaged, I can just print some more off.


If you enjoyed this, then please consider buying me a coffee to encourage me to do more.

grep Flags – The Good Stuff

While writing a post on practical shell patterns I had a couple of patterns that used grep commands.

I had to drop those patterns from the post, though, because as soon as I thought about them I got lost in all the grep flags I wanted to talk about. I realised grep deserved its own post.

grep is one of the most universal and commonly-used commands on the command line. I count about 50 flags you can use on it in my man page. So which ones are the ones you should know about for everyday use?

I tried to find out.

I started by asking on Twitter whether the five flags I have ‘under my fingers’ and use 99% of the time are the ones others also use.

It turns out experience varies widely on what the 5 most-used are.

Here are the results, in no particular order, of my researches.

I’ve tried to categorize them to make it easier to digest. The categories are:

  • ABC – The Context Ones
  • What To Match?
  • What To Report?
  • What To Read?
  • Honourable Mention

If you think any are missing, let me know in the comments below.

ABC – The Context Ones

These arguments give you more context around your match.

grep -A
grep -B
grep -C

I hadn’t included these in my top five, but as soon as I was reminded of them, -C got right back under my fingertips.

I call them the ‘ABC flags’ to help me remember them.

Each of these gives a specified context around your grep’d line. -A gives you lines after the match, -B gives you lines before the match, and -C (for ‘context’) gives you both the before and after lines.

$ mkdir grepflags && cd grepflags
$ cat > afile <<EOF                                                                   
a
b
c
d
e
EOF
$ grep -A2 c afile
c
d
e
$ grep -B2 c afile
a
b
c
$ grep -C1 c afile
b
c
d

This is especially handy for going through configuration files, where the ‘before’ context can give you useful information about where the thing your matching sits within a wider context.

What To Match?

These flags relate to altering what you match and don’t match with your grep.

grep -i

This flag ignores the case of the match. Very handy and routine for me to use to avoid not missing matches I might want to see (I often grep through large amounts of plain text prose).

$ cat > afile <<EOF
SHOUT
shout
let it all out
EOF
$ grep shout afile
shout
$ grep -i shout afile
SHOUT
shout
grep -v

This matches any lines that don’t match the regular expression (inverts), for example:

$ touch README.md
$ IFS=$'\n'   # avoid problems with filenames with spaces in it
$ for f in $(ls | grep -v README)
> do echo "top of: $f"
> head $f
> done
SHOUT
shout
let it all out

which outputs the heads of all files in the local folder except any files with README in their names.

grep -w

The -w flag only matches ‘whole-word’ matches, ignoring cases where submitted words are part of longer words.

This is a useful flag to narrow down your matches, and also especially useful when searching through prose:

$ cat > afile <<EOF
na
NaNa
na-na-na
na_na_na
hey jude
EOF
$ grep na afile
na
na-na-na
na_na_na
$ grep -w na afile
na
na-na-na
$ grep -vwi na afile
NaNa
na_na_na
hey jude

You might be wondering what characters are considered part of a word. The manual tells us that ‘Word-constituent characters are letters, digits, and the underscore.’ This is useful to know if you’re searching code where word separators in identifiers might switch between dashes and underscore. You can see this above with the na-na-na vs na_na_na differences.

What To Report?

These grep flags offer choices about how the output you see is rendered.

grep -h

grep -h suppresses the prefixing of filenames on output. An example is demonstrated below:

$ rm -f afile
$ cat > afile1 << EOF
a
EOF
$ cp afile1 afile2 
$ grep a *
afile1:a
afile2:a
$ grep -h a *
a
a

This is particularly useful if you want to process the matching lines without the filename spoiling the input. Compare these to the output without the -h.

$ grep -h a * | uniq
a 
$ grep -h a * | uniq -c
2
grep -o

This outputs only the text specified by your regular expression. One match is output per line, but multiple matches may be made per line.

This can result in more matches than lines, as in the example below, where you look for words that end lines that end in ‘ay’, and then any words with the letter ‘e’ in them (but not at the start or the end of the word).

$ rm -f afile1 afile2
$ cat > afile << EOF
Yesterday
All my troubles seemed so far away
Now it looks as though they're here to stay
Oh I believe
In yesterday
EOF
$ grep -o ' [^ ]*ay$' afile
 away
 stay
 yesterday
$ grep -o ' [^ ]*e[^ ]*' afile
 troubles
 seemed
 they're
 here
 believe
 yesterday
grep -l

If you’re fighting through a blizzard of output and want to focus only on which files your matches are in rather than the matches themselves, then using this flag will show you where you might want to look:

$ cat > afile << EOF
a
a
EOF
$ cp afile afile2
$ cp afile afile3grep -l
$ grep a *
afile:a
afile:a
afile2:a
afile2:a
afile3:a
afile3:a
$ grep -l a *
afile
afile2
afile3

What To Read?

These flags change which files grep will look at.

grep -r

A very popular flag, this flag recurses through the filesystem looking for matches.

$ grep -r securityagent /etc
grep -I

This one is my favourite, as it’s incredibly useful, and not so well known, despite being widely applicable.

If you’ve ever been desperate to find where a string is referenced in your filesystem (usually as root) and run something like this:

$ grep -rnwi specialconfig /

then you won’t have failed to notice that it can take a good while. This is partly because it’s looking at every file from the root, whether it’s a binary or not.

The -I flag only considers text files. This radically speeds up recursive greps. Here we run the same command twice (to ensure it’s not only slow the first time due to OS file cacheing), then run the command with the extra flag, and see a nearly 50% speedup.

$ time sudo grep -rnwi specialconfig / 2>/dev/null 
sudo grep -rnwi specialconfig /  418.01s user 382.19s system 70% cpu 19:03.09 total
$ time sudo grep -rnwi specialconfig /                                                  sudo grep -rnwi specialconfig /  434.19s user 411.62s system 70% cpu 19:56.25 total
$ time sudo grep -rnwiI specialconfig /                                                sudo grep -rnwiI specialconfig /  33.54s user 322.64s system 52% cpu 11:19.03 total

Honourable mention

There are many other grep flags, but I’ll just add one honourable mention at the end here.

grep -E

I spent an embarrassingly long time trying to get regular expressions with + signs in them to work in grep before I realised that default grep didn’t support so-called ‘extended’ regular expressions.

By using -E you can use those regular expressions just as the regexp gods intended.

$ cat > afile <<< aaaaaa+bc
$ grep -o 'a+b' a            # + is treated literally
a+b
$ grep -o 'aa*b' a           # a workaround
aaaaaaaaaab
$ grep -oE 'a+b' a           # extended regexp
aaaaaaaaaab

If you like this, you might like one of my books:
Learn Bash the Hard Way

Learn Git the Hard Way
Learn Terraform the Hard Way

LearnGitBashandTerraformtheHardWay
Buy in a bundle here

If you enjoyed this, then please consider buying me a coffee to encourage me to do more.

Why It’s Great To Be A Consultant

I spent 20 years slaving away at companies doing development, maintenance, troubleshooting, architecture, management, and whatever else needed doing. For all those years I was a permanent hire, working on technology from within.

I still work at companies doing similar things, but now I’m not a permie. I’m a consultant working for a ‘Cloud Native’ consultancy that focusses on making meaningful changes across organisations rather than just focussing on tech.

I wish I’d made this move this years ago. So here are the reasons why it’s great to be a consultant.

1) You are outside the company hierarchy

Because you come in as an outsider, and are not staying forever, all sorts of baggage that exists for permanent employees does not exist for you.

When you’re an internal employee, you’re often encouraged to ‘stay in your box’ and not disturb the existing hierarchy. By contrast, as a consultant you don’t need to worry as much about the internal politics or history when suggesting changes. Indeed, you’re often encouraged to step outside your boundaries and make suggestions.

This relative freedom can create tensions with permanent employees, as you often have more asking power than the teams you consult for. Hearing feedback of “I have been saying what you’re saying for years, and no-one has listened to me” is not uncommon. It’s not fair but it’s a fact: you both get to tell the truth, and get listened to more when you’re a consultant.

2) You have to keep learning

If you like to learn, consulting is for you. You are continually thrown into new environments, forced to consider new angles on industry vertical, organisational structure, technologies and questions that you may previously have not come across, thought about, or answered. You have to learn fast, and you do.

Frequently, when you are brought in to help a business, you meet the people in the business that are the most knowledgeable and forward-thinking. Meeting with these talented people can be challenging and intimidating, as they may have unrealistically high expectations of your ‘expertise’, and an unrealistically low opinion of their own capabilities.

It’s not unfrequent to wonder why you’ve been brought in when the people you’re meeting already seem to have the capability to do what you are there to help with. When this happens, after some time you go deeper into the business and realise that they need you to help spread the word from outside the hierarchy (see 1, above).

These top permie performers can be used to being ‘top dog’ in their domain, and unwilling to cede ground to people who they’ve not been in the trenches with. You may hear phrases like ‘consulting bullshit’ if you are lucky enough for them to be honest and open with you. However, this group will be your most energetic advocates once you turn them around.

If you can get over the (very human) urge to compete and ‘prove your expertise’, and focus on just helping the client, you can both learn and achieve a lot.

3) You get more exposure

Working within one organisation for a long time can narrow your perspective in various dimensions, such as technology, organisational structure, culture.

To take one vivid example, we recently worked with a ‘household name’ business that brought us in because they felt that their teams were not consistent enough and that they needed some centralisation to make their work more consistent. After many hours of interviews we determined that they were ideally organised to deliver their product in a microservices paradigm. We ended up asking them how they did it!

I was surprised, as I’d never seen a company with this kind of history move from a ‘traditional’ IT org structure to a microservices one, and was skeptical it could be done. This kind of ‘experience-broadening’ work allows you to develop a deeper perspective on any work you end up doing, what’s possible, and how it happens.

And it’s not only at the organisational level that you get to broaden your experience. You get to interact with, and even work for multiple execution teams at different levels of different businesses. You might work with the dreaded ‘devops team’, the more fashionable ‘platform team’, traditional ‘IT teams’, ‘SRE teams’, traditional ‘dev teams’, even ‘exec teams’.

All these experiences give you more confidence and authority when debating decisions or directions: red flags become redder. You’ve got real-world examples to draw on, each of which are worth a thousand theoretical and abstract powerpoint decks, especially when dealing with the middle rank of any organisation you need to get onside. Though you still have to write some of those too (as well as code sometimes too).

4) You meet lots of people

For me, this is the big one.

I’ve probably meaningfully worked with more people in the last two years than the prior ten. Not only are the sheer numbers I’ve worked with greater, they are from more diverse backgrounds, jobs, teams, and specialties.

I’ve got to know talented people working in lots of places and benefitted from their perspectives. And I hope they’ve benefitted from mine too.

Remember, there is no wealth but life.


Join Us

If you want to experience some of the above, get in touch:
Twitter: @ianmiell
email: ian.miell \at\ gmail.com

If you like this, you might like one of my books:
Learn Bash the Hard Way

Learn Git the Hard Way
Learn Terraform the Hard Way

LearnGitBashandTerraformtheHardWay
Buy in a bundle here

If you enjoyed this, then please consider buying me a coffee to encourage me to do more.