I’ve always been frustrated that people often talk about culture without giving actionable or realistic advice, and was previously prompted by this tweet to write about what I did when put in charge of a broken team:
Then the other week I met a change management type at a dinner who’d previously worked in manufacturing, and I asked him to recommend me some books. One of them was Turn the Ship Around and it was exactly the book I wanted to read.
The book tells the story of David Marquet, newly-elevated commander of the worst-performing nuclear submarine in the US Navy. It was considered a basket case, and he was given it at the last moment, meaning his previous year’s meticulous preparation (for another ship) was for nought. He was under-prepared, and the odds were against him.
Within a year he’d turned it round to be the best-performing, with the staff going on to bigger and better things, and the ship sustaining its newly-acquired status.
Just the abstract blew my mind – it’s hard enough to turn around a group of IT types whose worst failure might be to lose some data. Never mind an actual nuclear submarine, where as commander, you are personally responsible for anything that goes wrong.
I was greatly intrigued as to how he did it, and the book did not disappoint.
What Marquet Did
By his own account, what Marquet did was improvise. Faced with the constraints he had on delivering any improvement, given:
The poor state of crew morale on arrival
His relative lack of knowledge about the ship itself
The lack of time available to show an improvement
he had little option but either to: fail to make a significant improvement and ‘get by’ with the traditional management techniques, or do something drastic.
As he explains, his drastic course of action was to overthrow the principle of commander-control the US navy had assumed to be the best form of management for generations. The US navy’s traditional approach had been to give the commander absolute authority and responsibility on a ship. This resulted in what Marquet calls a ‘leader-follower’ mentality, which in many ways is a great way to run things.
With good discipline (something the services excel at training for) and a highly trained leader, you can get good results, especially in a safety-critical environment. You can also get a demotivated, reactive, apathetic crew who develop a culture that focusses on ‘doing the minimum’. When the culture is broken, it’s hard to change this by simply shouting at the crew louder or doubling down on discipline.
Leader-Follower to Leader-Leader
Marquet sought to replace the leader-follower culture with a leader-leader one. Since Marquet didn’t even fully understand his own ship, he had to delegate authority and responsibility down the ship’s command structure.
This is brought home to him dramatically when he issues an order that was impossible to fulfil. He issues an order to a navigator to move the ship at a certain speed. He hears an ‘Aye, aye, sir!’, and then moments later wonders why his order doesn’t seem to have been followed. It turns out the ship he is on literally cannot move at that speed!
This impresses on him that he has to do two things:
Abandon the pretence of his own omniscience
Encourage his staff to feed back information to him
In other words, he has to ‘give control‘ to his crew without endangering the world in the process. The book discusses how he achieves this, and gives some retrospective structure to his actions that make it easier to apply his experience to different environments where culture needs to be changed.
What Makes This Book So Different?
Morquand talks not only about what he did, but his concerns about his actions as he carried them out. For example, he describes how when he made chiefs responsible for signing off leave (and removed several layers of bureaucracy in the process), he worried that they would misuse their new power, or just make mistakes he himself would not make.
In fact, these fears turned out to be unfounded, and by that action, he demonstrated that he wanted to make real change to the way the ship worked. This ceding of control had far more effect, he says, than any exhortation from above to more proactivity or responsibility from his underlings. He argues that such exhortations don’t work, as people don’t take words anywhere near as seriously as actions when making change.
Anyone who’s undergone any kind of corporate transformation effort when on the rank and file will know the difference between words and actions.
Far from offering vague advice, Marquet goes to the level of supplying specific sets of questions to put to your staff in meetings, and useful advice on how to implement the policies and encourage the behaviours you need in your team.
Early on in the process he uses a CBT-style technique of ‘act as though we are proud of the ship’ to kick-start the change he wants to see. Literally anyone in a leadership role looking to improve morale can implement something like that quickly.
There’s very little sense of Marquet trying to sell a ‘perfect world’ story as he tells you what happened. In one vivid section, a normally dependable officer goes AWOL halfway through the year, and Marquet has to track him down. Marquet then takes the massive risk of letting the officer off, which further risks losing the respect of some of his subordinates, some of whom are hard-liners on discipline. None of this sounds like fun, or clear-cut.
In another section, he describes how an officer ‘just forgot’ about not flicking a switch even though there was a standard ‘red tag’ on it signalling that it shouldn’t be touched. Again, rather than just punishing, he spent 8 hours discussing with his team how they can prevent a recurrence in a practical way.
After rejecting impractical solutions like ‘get sign off for every action from a superior’ their solution reduced mistakes like this massively. The solution was another implementable tactic: ‘deliberate action’. Staff were required to call out what they are about to do, then pause before they do it, allowing others to intervene, while giving them literal pause for thought to correct their own mistakes.
The book ends up having a schema that is useful, and (mercifully) is not presented as a marketable framework, and which follows naturally from the story:
He wants to give people control
He can’t do that because: 1) they lack competence, and 2) they don’t know the broader context
He gives control piece by piece, while working on 1) and 2) using various replicable techniques
Some of the techniques Marquet uses to achieve the competence and knowledge of context have been covered, but essentially he’s in a constant state of training everyone to be leaders rather than followers within their roles.
Some highlighted techniques:
Encourage staff to ask for permission (‘I intend to […] because’) rather than wait for orders
Don’t ‘brief’ people on upcoming tasks, ‘certify’ (give them their role, ask them to study, and test them on their competence)
Creation of a creed (yes, a kind of ‘mission statement’, but one that’s in Q&A form and is also specific and actionable)
Specify goals, not methods
All of this makes it very easy to apply these teachings to your own environment where needed.
Despite my enthusiasm, I was left with a few question marks in my mind about the story.
The first is that Marquet seems to have had great latitude to break the rules (indeed the subtitle of the book is ‘A True Story of Building Leaders by Breaking the Rules’). His superiors explicitly told him they were more focussed on outcomes than methods. This freedom isn’t necessarily available to everyone. Or maybe one of the points of the books is that to lead effectively you have to be prepared to ‘go rogue’ to some extent and take risks to effect real changes?
Another aspect I wondered about was that I suspected Marquet started from a point where he had a workforce that were very strong in one particular direction: following orders, and that it’s easier to turn such people around than a group of people who are not trained to follow orders so well. Or maybe it’s harder, who knows?
Also, the ship was at rock bottom in terms of morale and performance, and everyone on board knew it. So there was a crisis that needed to be tackled. This made making change easier, as his direct subordinates were prepared to make changes to achieve better things (and get promotion themselves).
This makes me wonder whether a good way to make needed change as a leader when there is no obvious crisis is to artificially create one so that people get on board…
In a former life I was a history student. I wasn’t very good at it, and one of my weaknesses was an unwillingness to cut out the second-hand nonsense and read the primary texts. I would read up on every historian’s views on (say) the events leading up to the first world war, thinking that would give me a short-cut to the truth.
The reality was that just reading the recorded deliberations of senior figures at the time would give me a view at the truth, and a way to evaluate all the other opinions I felt bombarded by.
What I should have learned, in other words, was: ignore the noise, and go to the signal.
I was reminded of these learning moment recently when I finally read The Toyota Way. I had heard garbled versions of its message over the years through:
Reading blog after blog exhorting businesses to be ‘lean’ (I was rarely the wiser as to what that really meant)
Heard senior leadership in one company use the verb ‘lean’ (as in: ‘we need to lean this process’ – I know, right?)
One colleague tell me in an all-hands that we should all stop development whenever there was a problem ‘like they do at Toyota’ (‘How and why the hell is that going to help with 700 developers, and how do I explain that to customers?’, I thought. It came to nothing)
In other words, ‘lean’ seemed to be a content-free excuse to just vaguely tell people to deliver something cheaper, or give some second-hand cargo-cult version of ‘what Toyota do’.
So it was with some scepticism I found myself with some time and a copy of The Toyota Way in my hands. Once I started reading it, I realised that it was the real deal, and was articulating better many of the things I’d done to make change in business before, some of which I wrote about here and here.
‘That’s Manufacturing. I Work in a Knowledge Industry.’
One of the most obvious objections to anyone that foists The Toyota Way (TTW) on you is that its lessons apply to manufacturing, which is obviously different from a knowledge industry. How can physical stock levels, or assembly line management principles apply to what is done in such a different field?
The book deals with that objection on a number of levels.
The Toyota Way is a Philosophy, Not a Set of Rules
First, it emphasises that TTW is a philosophy for production in general, and not a set of rules governing efficient manufacturing. This philosophy can (depending on context) result in certain methods and approaches being taken that can feel like, or in effect become, rules, but can be applied to any system of production, whether it be production of pins, medicines, cars, services, or knowledge.
What that will result in in terms of ‘rules’ for your business will depend on your specific business’s constraints. So you’re under no obligation to do things the same way Toyota do them, because even they break their own supposed ‘rules’ if it makes sense for them. One example of this is the ‘rule’ that’s often cited that stock levels must always be low or minimal to prevent waste.
A high-level overall goal of TTW is to create a steady flow of quality product output in a pipeline that reduces waste. That can mean reducing stock levels in some cases (commonly considered a ‘rule’ of lean manufacturing), or even increasing them in others, depending on the overall needs of the system to maintain a steady flow.
So while the underlying principles of TTW are relatively fixed (such as ‘you should go and see what is going on on the floor’, ‘visual aids should be used collaboratively’, and so on), the implementation of those principles are relatively loose and non-prescriptive.
This maps perfectly to DevOps or Agile, which have a relatively clear set of principles (CALMS, and the Agile Manifesto, respectively) which can be applied in all sorts of ways, none of which are necessarily ‘correct’ for any given situation. In this context, the agile and DevOps industry that’s been built up around these movements are just noise.
Waste and Pipelines are Universal
Secondly, the concept of waste and pipeline is not unique to manufacturing. If your job is to produce weekly reports for a service industry, then you might consider that time spent making that report is wasted if its contents are not acted upon, or even read in a timely way.
A rather shocking amount of time can be spent in knowledge industries producing information that doesn’t get used. In my post on documentation I wrote about the importance of the ‘knowledge factory’ idea in running an SRE team, and the necessity to pay a ‘tax’ on maintaining those essential resources (roughly 5% of staff time in that case). The dividend, of course, was far greater.
Most of that tax was spent on removing or refining documentation rather than writing it. That was time well spent, as the biggest problem with documentation I’ve seen in decades of looking at corporate intranets is too much information, leading to distrust and the gradual decay of the entire system. So it was gratifying to read in TTW that:
Documentation audits take place regularly across the business
Are performed by a third party who ensures they are in order and follow standards
The principal check performed is to search for out of date documentation rather than quantity or correctness (how can an outsider easily determine that anyway?)
The root of the approach is summed up perfectly in the book:
‘Capturing knowledge is not difficult. the hard part is getting people to use the standards and contribute to improving it’
The Toyota Way
So in my experience, the fact that a car is being created instead of knowledge or software is not a reason to ignore TTW. Just like software delivery, a car is a product that requires both repeated activity in a pipeline, and creative planning of features and the building of technology in a bespoke way. All these parts of the process are covered and examined in TTW.
How Flow is Not Achieved
So how to you achieve a harmonious flow of output in your non-material factory? Again, this is essentially no different to manufacturing: what you’re dealing with is a system that has various sub-processes that themselves have inputs, outputs, dependencies and behaviours whose relationships need to be understood in order to increase throughput through the system.
How Flow is Achieved: Visualise for Collaboration First
Understanding, visualising and communicating your view of these relationships with your colleagues is hard, and critical to getting everyone pointing in the same direction.
This is something I’d also stumbled towards in a previous job as I’d got frustrated with the difficulty of visualising the constraints we were working under, and over which we had no control. I wrote about this in the post ‘Project Management with Graphviz’, where I used code to visualise and maintain the dependencies in a graph. I had to explain to so many people in so many different meetings why we couldn’t deliver for them that these graphs saved me lots of time.
Interlude – Visual Representations
Another principle outlined in TTW: visual representations should be simple and share-able. Unfortunately, this is the kind of thing you get delivered to your inbox as an engineer in an enterprise:
Now, I’m sure Project Managers eat this kind of thing for breakfast, and it makes perfect sense to them, but unless it corresponds to a commonly-seen and understood reality, it’s infinitely compressible information to the typical engineer. I used to almost literally ignore them. That’s the point of the visual representations principle of TTW: effective collaboration first, not complex schemas that don’t drive useful conversations.
In retrospect, and having read TTW, the answer to the problems of slow enterprise delivery are logically quite obvious: dependent processes need to be improved before downstream processes can achieve flow. For many IT organisations, that means infrastructure must be focussed on first, as these are the dependent services the development teams depend on.
But what often happens in real world businesses (especially those that do not specialise in IT)? Yup, centralised infrastructure gets cut first, because it is perceived that it ‘doesn’t deliver value’. Ironically, cutting centralised infrastructure causes more waste by cutting off the circulatory systems other parts of the business depend on for air.
So the formerly centralised cost gets mostly duplicated in every development team (or business unit) as they slowly learn they have to battle the ‘decagon of despair’ themselves separately from the infrastructure team that specialised in that effort before.
This is the infrastructure gap that AWS jumped headlong into: by scaling up infrastructure services to a global level, they could extract a tax from each business that uses it in exchange for providing services in a finite but sufficient way that removes dependencies on internal teams.
It is also the infrastructure gap that Kubernetes is jumping headlong into. By standardising infrastructure needs such as mutual TLS authentication and network control via sidecars, Kubernetes’ nascent open source Istio companion product is centralising those infrastructure needs again in a centralised and industry-standard way.
1) How Flow is Not Achieved: No Persistence in Pursuing Change
A key takeaway from the book is that efforts to make real change take significant lengths of time to achieve. TTW reports that it took Ford 5 years to see any benefits from adopting the Toyota Production System, and 10 years for any kind of comparable culture to emerge.
It’s extremely rare that we see this kind of patience in IT organisations (or their shareholders) trying to make cultural change. The only examples I can think of spring from existential crises that result in ‘do-or-die’ attempts to change where the change needed is the last roll of the dice before the company implodes. Apple is the most notable (and biggest) of these, but many other smaller examples are out there. You can probably think of similar analogous examples from your own life where you felt you had no choice but to make a change helped you achieve it.
2) How Flow is Not Achieved: Problems are Not Surfaced
The book contains anecdotes on the importance Toyota place on surfacing problems rather than hiding them. One example of this approach is the famous andon principle, where problems are signalled as quickly and clearly as possible to all appropriate people so the right focus can be given to quickly resolve the problem before production stops, or ‘stop the line’ to ensure the problem is properly resolved before continuing if it can’t be fixed quickly.
Examples include the senior manager who criticised the junior one for not having any of these line stoppages on the latter’s watch, because if there are no line stoppages then everything must be perfect, and it clearly can never be (unless quality control finds no problems and is doing its job, which was not the casein this instance).
This is the opposite to most production systems in IT, where problems are generally covered up or worked around in order to hide challenges from managers up the chain. This approach can only work for so long and results in a general deterioration in morale.
3) How Flow is Not Achieved: Focus on Local Optimisation
There is a great temptation, when trying to optimise production systems, to focus on local optimisations to small parts of the system that seem to be ripe for optimisation. While it can be satisfying to make small parts of the system run faster than before, it is ultimately pointless if the overall system is not constrained on those parts.
In manufacturing cars, optimising the production rate of the wing mirrors is pointless if wing mirrors are already produced faster than the engines are. Similarly, shaving a small amount off the cost of a wing mirror is (relatively speaking) effort wasted if the overriding cost is the engine. Better to focus on improving the engine.
In a software development flow, making your tests run a little faster is pointless if your features are never waiting for the tests to complete to deploy. Maybe you’re always waiting 2 days elapsed time for a manager to sign off a release, and that’s the bottleneck you should focus on.
4) How Flow is Not Achieved: Failure to ‘Go and See’
Throughout TTW, the importance of ‘going and seeing’ as a principle of management is reiterated many times. I wrote about the importance of this before in my blog on changing culture (Section 1: Get on the floor), but again it was good to see this intuition externally validated.
Two examples stuck in my mind: the story of the senior leader who did nothing but watch the production line for four hours so he could see for himself what was going on, and the minivan chief designer who insisted on personally driving in all 50 US states and Canada. The minivan designer then went back to the drawing board and made significant changes to the design that made sense in North America, but not in Japan (such as having multiple cup-holders for thelong journeys typical of that region).
Both of these leaders could have had an underling do this work for them, but the culture of Toyota goes against this delegatory approach.
Implicit in this is that senior leadership need to be bought into and aware of the detail in the domain in order to drive through the changes needed to achieve success.
Go Read the Book
I’ve just scratched the surface here of the principles that can be applied to DevOps from reading TTW.
It’s important not to swallow the kool aid whole here as well. Critiques of The Toyota Way exist (see this article from an American who worked there), and are worth looking at to remind yourself Toyota have not created the utopia that reading The Toyota Way can leave you thinking they have. However, the issues raised there seem to deal with the general challenges of the industry, and the principles not being followed in certain cases (Toyota is a human organisation, after all, not some kind of spiritual production nirvana).
Oh, and at the end of the book there’s also a free ‘how to do Lean consulting’ section at the back that gives you something like a playbook for those that want to consult in this area, or deconstruct what consultants do with you if you bring them in.
If you’ve ever spent any time building Docker images, you will know that Docker caches layers as they are built, and as long as those lines don’t change, Docker treats the outputted layer is identical
There’s a problem here. If you go to the network to pick up an artefact, for example with:
RUN curl https://myartefactserver.local/myjar.jar > myjar.jar
then Docker will treat that command as cache-able, even if the artefact has changed.
Solution 1: –no-cache
The sledgehammer solution to this is to add a --no-cache flag to your build. This removes the caching behaviour, meaning your build will run fully every time, no matter whether the lines of your Dockerfile change or not.
Problem solved? Well… not really. If your build is installing a bunch of other more stable artefacts, like this:
RUN apt-get update -y && apt-get install -y many packages you want to install
# more commands
RUN curl https://myartefactserver.local/myjar.jar > myjar.jar
Then every time you want to do a build, the cycle time is slow as you wait for the image to fully rebuild. This can get very tedious.
Solution 2: Manually Change the Line
You can get round this problem by dropping the --no-cache flag and manually changing the line every time you build. Open up your editor, and change the line like this:
RUN [command] # sdfjasdgjhadfa
Then the build will But this can get tedious.
Solution 3: Automate the Line Change
But this can get tedious too. So here’s a one-liner that you can put in an alias, or your makefile to ensure the cache is busted at the right point.
If you have worked your way in software for a number of years and you’re not a security specialist, you might be occasionally confronted by someone from ‘security’ who generally says ‘no’ to things you deliver.
For a long time I was in this position and was pretty bewildered by how to interpret what they were saying, or understand how they thought.
Without being trained or working in the field, it can be difficult to discern the underlying principles and distinctions that mark out a security magus from a muggle.
…if you’ve ever been locked in a battle with a security consultant to get something accepted then it can be hard to figure out what rules they are working to.
So here I try and help out anyone in a similar position by attempting to lay out clearly (for the layperson) some of the principles (starting with the big ones) of security analysis before moving onto more detailed matters of definition and technology.
‘There’s no such thing as a secure system’
The broadest thing to point out that is not immediately obvious to everyone is that security is not a science, it’s an art. There is no such thing as a secure system, so to ask a security consultant ‘is that secure?’ is to invite them to think of you as naive.
This makes IT security an art, not a science, which took me some time to catch onto. There’s usually no magic answer to getting your design accepted, and often you can get to a position where some kind of tradeoff between security and risk is evaluated, and may get you to acceptance.
Anecdote: I was once in a position where a ‘secrets store’ that used base64 encoding was deemed acceptable for an aPaaS platform because the number of users was deemed low enough for the risk to be acceptable. A marker was put down to review that stance after some time, in case the usage of the platform spread, and a risk item added to ensure that encryption at rest was addressed by no later than two years.
A corollary of security being an art is that ‘layer 8’ of the stack (politics and religion) can get in the way of your design, especially if it’s in any way novel. Security processes tend to be an accretion of: specific directions derived from regulations; the vestigal scars of past breaches; personal prejudice; and plain superstition.
Trust Has to Begin Somewhere
Often when you are discussing security with people you get into situations where you get into a ‘turtles all the way down’ scenario, where you wonder how anything can be done because nothing is ever trusted.
Anecdote: I have witnessed a discussion with a (junior) security consultant where a demand was made to encrypt a public key, based on a general injunction that ‘all data must be encrypted’. ‘Using what key?’ was the natural question, but an answer was not forthcoming…
The plain fact is that everyone has to trust something at some point in order to move information around anything. Examples of things you might (or might not) trust are:
The veracity of the output of dmesg on a Linux VM
The Chef server keys stored on your hardened VM image
That Alice in Accounts will not publish her password on Twitter
That whatever is in RAM has not been tampered with or stolen
The root public keys shipped with your browser
Determine Your Points of Trust
Very often determining what you are allowed to trust is the key to unlocking various security conundrums when designing systems. When you find a point of trust, exploit it (in a good way) as much as you can in your designs. If you’ve created a new point of trust as part of your designs, then prepare to be challenged.
Responsibility Has to End Somewhere
When you trust something, usually someone or something must be held responsible when it fails to honour that trust. If Alice publishes her password on Twitter, and the company accounts are leaked to the press, then Alice is held responsible for that failure of trust. Establishing and making clear where the trust failure would lie in the event of a failure of trust is also a way of getting your design accepted in the real world.
Determining what an acceptable level of trust to place in Alice will depend on what her password gives her access to. Often there are data classification levels which determine minimum requirements before trust can be given for access to that data. At the extreme end of “secret”, root private keys can be subject to complex ceremonies that attempt to ensure that no one person can hijack the process for their own ends.
Consequences of Failure Determines Level of Paranoia
Another principle that follows from the ‘security is an art, not a science’ principle is that the extent to which you fret about security will depend on the consequences of failure. The loss of a password that allows someone to read some publicly-available data stored on a company server will not in itself demand much scrutiny from security.
The loss of a root private key, however, is about as bad as it can get from a security standpoint, as that can potentially give access to all data across the entire domain of that key hierarchy.
If you want to reduce the level of scrutiny your design gets put under, reduce the consequences of a breach.
If you want to keep pace with a security consultant as they explain their concerns to you, then there are certain key distinctions that they may frequently refer to, and assume you understand.
Getting these distinctions and concepts under your belt will help you convince the security folks that you know what you’re doing.
Encryption vs Encoding
This is a 101 distinction you should grasp.
Encoding is converting some data into some other format. Anyone who understands the encoding can convert the data back into readable form. ASCII and UTF-8 are examples of encodings that convert numbers into characters. If you give someone some encoded data, it won’t take them long to figure out what the data is, unless the encoding is extremely complex or obscure.
Encryption involves needing some secret or secure process to get access to the data, like a private ‘key’ that you store in your ~/.ssh folder. A key is just a number that’s very difficult to guess, like your house key’s (probably) unique shape. Without access to that secret key, you can’t work out what that data is without a lot of resources (sometimes more than the all the world’s current computing power) to overcome the mathematical challenge.
Hashing vs Encryption
Hashing and encryption may be easily confused also. Hashing is the process of turning one set of data into another through a reproducible algorithm. The key point about hashing is that the data goes one-way. If you have the hash value (say, ae5690f1aff) then you can’t easily reverse that to the original
Hashing has a weakness. Let’s say you ‘md5sum’ an insecure password like password. You will always get the value: 5f4dcc3b5aa765d61d8327deb882cf99&oq=5f4dcc3b5aa765d61d8327deb882cf99
from the hash.
If you store that hashed password in a database, then anyone can google it to find out what your password really is, even though it’s a hash. Try it with other commonly-used passwords to see what happens.
This is why it’s important to ‘salt‘ your hash with a secret key so that knowledge of the hash algorithm isn’t enough to crack a lot of passwords.
Authentication vs Authorization
Sometimes shortened to ‘authn‘ and ‘authz‘, this distinction is another standard one that gets slipped into security discussions.
Authentication is the process of determining what your identity is. The one we’re all familiar with is photo id. You have a document with a name and a photo on it that’s hard to fake (and therefore ‘trusted’), and when asked to prove who you are you produce this document and it’s examined before law enforcement or customs accepts your claimed identity.
There have been many interesting ways to identify authenticity of identity. My favourite is the scene in Big where the Tom Hanks character has to persuade his friend that he is who he says he is, even though he’s trapped in the body of a man:
To achieve this he uses a shared secret: a song (and associated dance data) that only they both know. Of course it’s possible that the song was overheard or some government agency had listened in to their conversations for years to fake the authentication, but the chances of this are minimal, and would raise the question of: why would they bother?
What would justify that level of resources just to trick a boy into believing something so ludicrous? This is another key question that can be asked when evaluating the security of a design.
The other example I like is the classic spy trope of using two halves of a torn postcard, giving one half to each side of a communication, making a ‘symmetric key’ that is difficult to forge unless you have access to one side of it:
Symmetric vs Asymmetric Keys
This also exemplifies nicely what a symmetric key is. It’s a key that is ‘the same’ one used on both sides of the communication. A torn postcard is not ‘the same’ on both sides, but it can be argued that if you have one part of it, it’s relatively easy to fake the other. This could be complicated if the back of the postcard had some other message known only to both sides written on it. Such a message would be harder to fake since you’d have to know the message in both people’s minds.
An asymmetric key is one where access to the key used to encrypt the message does not imply access to decrypt the message. Public key encryption is an example of this: anyone can encrypt a message with the public key, but the private key is kept secret by the receiver. Anyone can know the public key (and write a message using it), but only the holder of the private key can read the message.
No authentication process is completely secure (remember, nothing is secure, right?), but you can say that you have prohibitively raised the cost of cheating security by demanding evidence of authenticity (such as a passport or a driver’s license) that is costly to fake, to the point where it’s reasonable to say acceptably few parties would bother.
If the identification object itself contains no information (like a bearer token), then there is an additional level of security through as you have to both own the objects, and know what it’s for. So even if the key is lost, more has to happen before there is a compromise of the system.
Authorization is the process of determining whether you are allowed to do something or not. While authentication is a binary fact about one piece of information (you are either who you say you are, or you are not), authorization will depend on both who you are and what you are asking to do.
In other words: Dave is still Dave. But Dave can’t open the bay doors anymore. Sorry Dave.
Following on from Authentication and Authorization, Role-Based Access Control gives permission to a more abstract entity called a role.
Rather than giving access to that user directly, you give the user access to the role, and then that role has the access permissions set for it. This abstraction allows you to manage large sets of users more easily. If you have thousands of users that have access to the same role, then changing that role is easier than going through thousands of users one-by-one and changing their permissions.
To take a concrete example, you might think of a police officer as having access to the ‘police officer’ role in society, and has permission to stop someone acting suspiciously in addition to their ‘civilian’ role permissions. If they quit, that role is taken away from them, but they’re still the same person.
Security Through Obscurity
Security through obscurity is security through the design of a system. In other words, if the design of your system were to become public then it would be easy to expose.
Placing your house key under a plant next to the door, or under the doormat would be the classic example. Anyone aware of this security ‘design’ (keeping the key in some easy-to-remember place near the door) would have no trouble breaking into that house.
By contrast, the fact that you know that I use public key encryption for my ssh connections, and even the specifics of the algorithms and ciphers used in those communications does not give you any advantage in breaking in. The security of the system depends on maths, specifically the difficulty in factoring a specific class of large numbers.
If there are weaknesses in these algorithms then they’re not publicly known. That doesn’t preclude the possibility that someone, somewhere can break them (state security agencies are often well ahead of their time in cryptography, and don’t share their knowledge, for obvious reasons).
It’s a cliche to say that security through obscurity is bad, but it can be quite effective at slowing an attacker down. What’s bad about it is when you depend on security through obscurity for the integrity of your system.
An example of security through obscurity being ‘acceptable’ might be if you run an ssh server on (say) port 8732 rather than 22. You depend on ssh security, but the security through obscurity of running on a non-standard port prevents casual attackers from ‘seeing’ that your port 22 is open, and as a secondary effect also can prevent your ssh logs from getting overloaded (perhaps exposing to other kinds of attack). But any cracker worth her salt wouldn’t be put off by this security measure alone.
If you really want to impress your security consultant, then casually mention Kerckhoffs Principle which is a more formal way of saying ‘security through obscurity is not sufficient’.
Authentication works the same way, but authorization is only allowed for a minimal set of functions. This reduces the blast radius of compromise.
Blast radius is a metaphor from nuclear weapons technology. IT people use it in various contexts to make what they do sound significant.
A simple example might be a process that starts as root (because it might need access to a low-numbered port, like an http server), but then drops down. This ensures that if the server is compromised after that initial startup then the consequences would be far less than before. It is then up for debate whether that level of security is sufficient.
Anecdote: I once worked somewhere where the standard http server had this temporary root access removed. Users had to run on a higher-numbered port and low-numbered ports were run on more restricted servers.
In certain NSA-type situations, you can even get data stores that users can write to, but not read back! For example, if a junior security agent submits a report to a senior, they then get no access to that document once submitted. This gives the junior the minimal level of privilege they need to do their job. If they could read the data back, then that increases the risk of compromise as the data would potentially be in multiple places instead of just one.
There are other ways of reducing the blast radius of compromise. One way is to use tokens for authentication and authorization that have very limited scope.
At an extreme, an admin user of a server might receive a token to log into it (from a highly secured ‘login server’) that:
can only be used once
limits the session to two minutes
expires in five minutes
can only perform a very limited action (eg change a single file)
can only be used from a specific subnet
If that token is somehow lost (or copied) in transit then it could only be used before it’s used (within five minutes) by the intended recipient for a maximum of two minutes, and the damage should be limited to a specific file if (and only if) the user misusing the token already has access to the specified network.
By limiting the privileges and access that that token has the cost of failure is far reduced. Of course, this focusses a large amount of risk onto the login server. If the login server itself were compromised then the blast radius would be huge, but it’s often easier for organisations to manage that risk centrally as a single cost rather than spreading it across a wide set of systems. In the end, you’ve got to trust something.
Features like these are available in Hashicorp’s Vault product, which centralise secrets management with open source code. It’s the most well-known, but other products are available.
You might have noticed in the ‘Too Many Secrets’ clip from the film Sneakers above that access to all the systems was granted simply by being able to decrypt the communications. You could call this one-factor authentication, since it was assumed that the identity of the user was ‘admin’ just by virtue of having the key to the system.
Of course, in the real world that situation would not exist today. I would hope that the Federal Reserve money transfer system would at least have a login screen as well before you identify yourself as someone that can move funds arbitrarily around the world.
A login page can also be regarded as one-factor authentication, as the password (or token) is the only secret piece of information required to prove authenticity.
Multi-factor authentication makes sure that the loss of one piece of authentication information is not sufficient to get access to the system. You might need a password (something you know), and a secret pin (another thing you have), and a number generated by your mobile phone, and a fingerprint, and the name of your first pet. That would be 5-factor encryption.
Of course, all this is undermined if the recovery process sends a link to an authentication reset to an email address that isn’t secured so well secured. All it takes then is for an attacker to compromise your email, and then tell the system that you’ve lost your login credentials. If your email is zero- or one-factor authentication than the system is only as secure as that and all the work to make it multi-factor has been wasted.
This is why get those ‘recovery questions’ that supposedly only you know (name of your first pet). Then, when people forget those, you get other recovery processes, like sending a letter to your home with a one-time password on it (which of course means trusting the postal service end-to-end), or an SMS (which means trusting the network carrier’s security). Once again, it’s ‘things you can trust’ all the way down.
So it goes.
Acceptable Risk and Isolation
We’ve touched on this already above when discussing the ‘prohibitive cost of compromising a system’ and the ‘consequences of a breach’, but it’s worth making explicit the concept of ‘acceptable risk’. An acceptable risk is a risk that is known about, but whose consequences of compromise are less than the effort of
A sensible organisation concerned about security in the real world will have provisions for these situations in their security standards, as it could potentially save a lot of effectively pointless effort at the company level.
For example, a username/password combination may be sufficient to secure an internal hotel booking system. Even if that system were compromised, then (it might be argued) you would still need to compromise the credit card system to exploit it for material gain.
The security consultant may raise another factor at this point, specifically: whether the system is appropriately isolated. If your hotel booking system sits on the same server as your core transaction system, then an exploit of the book system could result in the compromise of your core transaction system.
Sometimes, asking a security consultant “is that an acceptable risk?” can yield surprising results, since they may be so locked into saying ‘no’ that they may have overlooked the possibility that the security standards they’re working to do indeed allow for a more ‘risk-based’ approach.
That was a pretty quick tour through a lot of security concepts that will hopefully help you if you are bewildered by security conversations.
If I missed anything out, please let me know: @ianmiell on twitter.
Most people who use Linux pretty quickly learn about man pages, and how to navigate them with their preferred pager (usually less these days).
Less well known are the info pages. If you’ve never come across them, these look like man pages, and contain similar information, but are invoked like this:
Over the past couple of decades I often found myself looking at an info page and wondering how to navigate it, hitting various keys and getting lost and frustrated.
I tried man info, but that didn’t tell me how to navigate the pages. More rarely I would try info info, but didn’t have the time or patience to do follow the tutorial there and then as I was busy trying to get some information, stat.
The other day I finally had enough and decided to take the time to sit down and learn it properly. It didn’t take that long, but I figured there was a case for writing down a helpful guide for new users that just want to get going.
The Bare Minimum
Here’s the bare minimum you need to read through an info page without ever getting lost:
] – next page
[ – previous page
space – page down within page
b – page up within page
q – quit
If you want to get commands into your muscle memory as fast as possible, focus on these. It won’t get you round pages efficiently, but you won’t wonder how to get back to where you were, or how you got where you are. If you’re a very casual user, stop here and come back later when you get fed up of spinning forwards and backwards through pages to find something.
Try it with something like info sed.
If you want to get to the next level with info, then these commands will help:
n – next page in this level
p – previous page in this level
return – jump to page ‘lower down’
l – go back to the last node seen
u – go ‘up’ a level
info has a hierarchical structure. There is a top-level page, and then ‘child’ pages that can have other pages at the same ‘level’. To go to the next page at the same level you can hit the n key. To go back to the previous page at the same level you hit p.
Occasionally you will get an item that allows you ‘jump down’ a level by hitting the return key. For example, by placing the cursor on the ‘Definitions’ line below and hitting return you will be taken to
* Introduction:: An introduction to the shell.
* Definitions:: Some definitions used.
To return to the page you were last on at any point, you can hit l (for ‘last page’) and you will be returned to the top of that page. Or if you want to go ‘up’ a level, type u.
If you’re still interested then you might want to read through info info carefully, but before you do here’s a couple of final tips to help avoid getting lost in that set of pages (which I have done more than once).
First, when you get stuck or want to dig in further, you can get help:
? – show the info commands window
h – open the general help window
Confusingly, these options opens up a half-window that, in the case of h at least, gives no indication of how to close it down again. Here’s how:
C-x 0 – close the window
Hitting CTRL and x together, followed by 0 gets you out.
You might wonder what the point of learning to read info pages is.
For me, the main reasons are:
They are often far more detailed (and more structured) than man pages
They are more definitive and complete. The grep info page, for example, contains a great set of examples, a discussion on performance, and an introduction to regular expressions. In fact, they’re intended to be mini books that can be printed off when converted to the appropriate format
You can irritate and/or intimidate colleagues by dismissing man page usage as ‘inferior’ and asserting that real engineers use info (joke)
Aside from anything else, I find getting fluent with these pieces of relative arcana satisfying. Maybe it’s just me.
It’s a pipeline definition file similar to GoCD’s, or other definition formats for Jenkins et al. You can trigger workflows based on (for example) a crontab schedule, or repository push, or repository pull-request, or when a URL is hit. I’m sure more triggers are to come, assuming they don’t exist already.
The format isn’t 100% intuitive, but is as easy to pick up as anything else, and I’m sure the docs will improve (right now there seems to be two sets of docs, one more formal and in the old (deprecated) HCL format, and the other less formal and in the new YAML format. I’m not entirely sure of the status of the ‘older’ documentation, but it hasn’t failed me yet).
GitHub Actions doesn’t just consist of this functionality in your repo. GitHub is providing a curated set of canned actions here that you can reference in your workflows. You needn’t use theirs, either, you can use any you can find on GitHub (or maybe anywhere else; I haven’t tried).
For me, the big deal is that this co-locates the actions with your code. So you can trigger a rebuild on a push, or on a schedule, or from an external URL. Just like CI tools do, but with less hassle and zero setup.
But it doesn’t just co-locate code and CI.
It is also threatening to take over CD, secrets management (there’s a ‘Secrets’ tab in the repo’s settings now), artifact store (there’s a supported ‘upload-artifact’ action that pushes arbitrary files to your repo), and user identity. Add in the vulnerability detection functionality and the whole package is as compelling as hell.
An Azure Gateway Drug? An AWS Killer?
When the possibilities of this start to dawn on you, it’s truly dizzying.
GitHub effectively gives you, for free, a CI/CD platform to run more or less whatever you like (but see limits, below). You can extend it to manage your code workflow in however sophisticated a way you like, as you have access to the repository’s GitHub token.
The tradeoff is that it’s all so easy that your business is soon going to depend on GitHub so much Microsoft will have a grip on you as tight as Windows used to.
I think the real trojan horse here is user identity. By re-using the identity management your business might already trust in GitHub, and extending its scope to help solve the challenges of secrets management and artifact stores, whole swathes of existing work could be cut away from your operational costs.
The default ‘hello-github-action’ setup demonstrates a Docker container that runs on an Ubuntu VM base. I found this quite confusing. Is access to the VM possible? If it’s not, why do I care whether it’s running on Ubuntu 18 or Ubuntu 16? I did some wrangling with this but ran into apparently undocumented requirements for an action.yml file, and haven’t had time to bottom them out.
(As an aside, the auto-created lab that GitHub makes for new users is one of the best UX’s I’ve ever seen for onboarding to a new product.)
What you do get is root within the container. Nice. And you can use an arbitrary container, from DockerHub or wherever.
You also get direct access back to GitHub without any faff. By default you get access to a github secret.
As with all these remote build environments, debugging can be a PITA. You can rig up a local Docker container to behave as it would on the server, but it’s a little fiddly to get the conventions right, as not everything about the setup is documented.
Limits and Restrictions
Limits are listed here, and includes a stern warning not to use this for ‘serverless computing’, or “Any other activity unrelated to the production, testing, deployment, or publication of the software project associated with the repository where GitHub Actions are used. In other words, be cool, don’t use GitHub Actions in ways you know you shouldn’t.”
Which makes me wonder: are they missing an opportunity here? I have serverless applications I could run on here, and (depending on the cost) might be willing to pay GitHub to host them for me. I suspect that they are not going to sit on that opportunity for long.
Each virtual machine has the same hardware resources available, which I assume are freely available to the running container:
2 core CPUs
7 GB of RAM memory
14 GB of SSD disk space
which seems generous to me.
The free tier gives you 2000 minutes (about a day and a half) of runtime, which also seems generous.
GitHub Actions is a set of features with enormous potential for using your codebase as a lever into your entire compute infrastructure. It flips the traditional view of code as just something to store, and compute where the interesting stuff happens on its head: the code is now the centre of gravity for your compute, and it’s only a matter of time before everything else follows.
Most guides to bash history shortcuts exhaustively list all of the shortcuts available to you.
The problem I always had with that was that I would use them once, and then glaze over as I tried out all the possibilities. Then I’d move onto my working day and completely forget them, retaining only the well-known !! trick I learned when I first started using bash.
So most never got committed to memory.
Here I outline the shortcuts I actually use every day. When people see me use them they often ask me “what the hell did you do there!?”, conferring God-like status on me with minimal effort or intelligence required.
I recommend using one a day for a week, then moving onto the next one. It’s worth taking your time to get them under your fingers, as the time you save will be significant in the long run.
1) !$ – The ‘Last Argument’ One
If you only take one shortcut from this article, make it this one.
It substitutes in the last argument of the last command into your line.
Consider this scenario:
$ mv /path/to/wrongfile /some/other/place mv: cannot stat '/path/to/wrongfile': No such file or directory
Ach, I put the wrongfile filename in my command. I should have put rightfile instead.
You might decide to fully re-type the last command, and replace wrongfile with rightfile.
$ tar -cvf afolder afolder.tar
tar: failed to open
Like others, I get the arguments to tar (and ln) wrong more than I would like to admit:
When you mix up arguments like that, you can run:
$ !:0 !:1 !:3 !:2
tar -cvf afolder.tar afolder
and your reputation will be saved.
The last command’s items are zero-indexed, and can be substituted in with the number after the !:.
Obviously, you can also use this to re-use specific arguments from the last command rather than all of them.
3) !:1-$ – The ‘All The Arguments’ One
Imagine you run a command, and realise that the arguments were correct, but
$ grep '(ping|pong)' afile
I wanted to match ping or pong in a file, but I used grep rather than egrep.
I start typing egrep, but I don’t want to re-type the other arguments, so I can use the !:1-$ shortcut to ask for all the arguments to the previous command from the second one (remember they’re zero-indexed) to the last one (represented by the $ sign):
$ egrep !:1-$
egrep '(ping|pong)' afile
You don’t need to pick 1-$, you can pick a subset like 1-2, or 3-9 if you had that many arguments in the previous command.
The above shortcuts are great when I know immediately how to correct my last command, but often I run commands after the orignal one which mean that the last command is no longer the one I want to reference.
For example, using the mv example from before, if I follow up my mistake with an ls check of the folder’s contents:
$ mv /path/to/wrongfile /some/other/place
mv: cannot stat '/path/to/wrongfile': No such file or directory
$ ls /path/to/
…I can no longer use the !$ shortcut.
In these cases, you can insert a -n: (where n is the number of commands to go back in the history) after the ! to grab the last argument from an older command: