Reinventing The Wheel, Now At A Bargain Price!

For twenty years, saying “don’t reinvent the wheel” was usually enough to win any argument about writing bespoke software. But that advice assumed two things: writing code was expensive, and importing code was cheap. AI and supply-chain governance are changing both assumptions.


A Short Personal History of Libraries

When I started in software it felt like the wild west in terms of security, audit and compliance. I worked in an unregulated industry as a third party supplier to some FTSE-100 businesses as an engineer. I could tell stories about access and identity management (or the lack of it) that would turn a 2026 CISO’s hair white (or, more likely, whiter).

Wind forward a quarter of a century, and now security is not only its own discipline, but (mostly) fully baked into the software development lifecycle. Compliance is on its way to the party too (this is the area I’m working on at the moment, contact me etc.).

In parallel, software libraries have also become more and more embedded into the software development lifecycle. It may be hard for younger readers to believe, but I spent many years maintaining incredibly busy software platforms that rarely added a publicly available library to its codebase. If we did add them, they were mostly small enough that we could (and usually did) inspect the code ourselves.

These days, it’s not a software project unless you have thousands of npm dependencies you have never even heard of downloaded on an npm install, and my go.mod files can be even longer than the source code for some of my smaller or half-started projects.


The Software Library Pact

Outsourcing engineering to third party libraries always was a trade-off. Writing software was expensive and dangerous. It was far better to outsource bits of that to a freely available online solution. This resulted in a Faustian pact: add all these barely understood dependencies and you won’t need to find the money to finish your project. Aside from the financial benefit, you would get free upgrades, extra functionality you might need in the future, and so on.

This pact was popularised with the slogan ‘do not re-invent the wheel’, which is ancient wisdom to guard against the well-known tendency for engineers to build things unnecessarily to satisfy their creative urges.

However, the consequences of that (perfectly rational) pact are catching up with the industry. Mephistopheles has arrived to take his due, and now engineers’ dependency additions are now continually scanned and analysed for weaknesses by black hat crackers, white hat hackers, supply chain crypto thieves, and auditors alike. When a widely used library has a flaw discovered in it, millions of voices suddenly cry out in terror and are immediately busy managing patches.

It was while talking about these challenges of onerous library management recently with a friend that a thought clarified for both of us that had been circling our consciousness since the advent of LLMs:

Is it now sometimes safer and cheaper to write and maintain bespoke code, rather than importing libraries?

I know this sounds crazy, but let me explain. My friend works in the defence industry. Every new library involves an immense amount of bureaucracy to justify its addition, map its dependencies etc.. Then, once the library is embedded in the codebase, chances are high that flaws will be found in it (or one of its dependencies) that will incur significant bureaucratic costs to upgrade. Think of it like a dependabot workflow, but each alert is a punch in the face.

By contrast, application code that is peculiar to the code requires much less bureaucracy to implement and maintain.

Given that application-specific code is now very significantly cheaper to produce than it has been in the past, and using libraries is now more expensive than it has been in the past, then the calculus behind software delivery has been radically altered. Is it now sometimes safer and cheaper to write and maintain bespoke code, rather than importing libraries? The specific example my friend adduced was a large open source library, a small subset of which’s functionality he wanted to use. He ended up reimplementing that small piece with an LLM, as he reasoned that in that case the various trade-offs were worth it.

I mentioned this notion to an experienced CISO recently and it’s fair to say he was not amenable to the idea. I won’t retail his exact words, as I know some readers have delicate sensibilities.

The CISO objection is not stupid. The industry has learned the hard way that bespoke code is often unreviewed, under-tested, undocumented, and full of boring vulnerabilities. “We wrote it ourselves” is not a security argument.

I’m not giving up on the argument that the calculus has changed though, because I think there’s an economic case to be made in an increasing number of cases. The old mental model was that “npm install is free!”, whereas the new model is that every dependency is a recurring obligation. Each library creates future work: CVE triage, version bumps, breaking changes, SBOM entries, licence checks, audit queries, exception handling, patch windows, and operational risk.


Benefits

Let’s itemize the benefits of choosing to build your own functionality than import dependencies:

1) Writing Code Is Cheaper Than It Was

LLMs can spew out functionality all day, and they’re far, far better at that than maintaining and reasoning about larger systems. You still have to make sure the code it produces works as intended, write tests (LLMs can help with that too), etc., but it’s undeniably cheaper to write an internal library to (for example) prepend characters to a string than it was pre-LLM.

2) It’s Private

Your internal library does not get picked up by code scanners (unless they can comb through your source code directly, in which case they are welcome to find fault). Your internal library does not need to be put in an SBOM. This will reduce noise right across your SDLC.

In other words, bespoke code removes one class of externally visible dependency risk: maintainer compromise, dependency confusion, abandoned packages, and surprise transitive upgrades. It does not remove the need to scan, test, review, and threat-model the code itself.

3) It’s Crafted for Purpose

Code that you make for your needs does exactly what you need it to, and no more.

It’s not subject to any particular library’s bloated attack surface now, or in the future. It doesn’t pick up an increasing number of secondary dependencies over time as it tries to cover more and more features and corner cases you don’t have.

4) It’s Cheaper to Maintain / More Secure

We’ve already discussed some of the ways it can be easier to maintain, but it’s worth itemising the kinds of attacks and disruption that we’ve seen due to third party dependencies:

  • left-pad (2016), where the withdrawal of 11 lines of non-obfuscated code introduced as a library dependency resulted in widespread panic
  • event-stream (2018), where another widely-used package was used as part of a supply chain attack that tried to steal Bitcoin wallets
  • faker.js (2022), where the maintainer decided to ‘upgrade’ his code to a cris de coeur about the ingratitude of the industry towards open source maintainers

The failure modes are not “the library has a bug.” They are: the maintainer disappears, the maintainer is compromised, the maintainer turns hostile, a transitive dependency changes, a package is unpublished, or the ecosystem shifts under you.

A recent discussion about vulnerability management on HN has some good illustrations of the tensions.

I could also bring in arguments about security through obscurity being a useful (but obviously not sufficient) line of defence, but the itself phrase seems to trigger people, so I won’t.


Objections

The question is not whether we should stop using libraries. The question is whether some dependency imports are now harder to justify than bespoke, narrowly scoped code.

But since what I’m talking about is a trade-off there are of course reasonable objections to raise. I’ll itemise them here, and argue how these arguments’ force has changed recently.

1) It Will Be Less Secure

Your code is specific to you, and hasn’t been subject to a ‘many eyes make bugs shallow’ process of open source delivery. Your code might have old-fashioned flaws like unchecked input, for example.

These objections are self-evidently true. However:

  • LLMs (and other modern tooling) can perform a lot of the security checking of code locally, and far more cheaply than a traditional sec team.
  • Libraries (as we’ve seen above) have attack vectors that make them less secure also.

2) It Will Be Harder to Maintain

I don’t think that will always be true now that we have some form of intelligence to scour our code. Further evidence for how it is hard to maintain libraries is outlined above.

If your code talks to a well-known API for example), then you could argue that using a public library will make you more future proof. Whether that argument holds will depend on how often and significantly the API changes, how much of the API you need to interact with, and how likely it is that the library will be well-maintained in future. I’ve spent a fair amount of time in the last 10 years chasing down poorly maintained libraries that haven’t kept pace with their target API, resulting in me spending significant effort replacing one library with another (or even, not infrequently, having to patch the library with some code I write myself as no such library now exists).

3) If Your Engineers (Now Or In The Future) Are Clueless, This Could Be A Disaster for You

Absolutely.

4) Some Libraries Are Too Important To Rewrite Yourself

Yup. I definitely don’t recommend rolling your own cryptographic libraries, or implementing your own timezone management functionality.

LLMs do not make writing hard software easy. It makes some boring ‘already solved’ software cheap enough that importing a dependency is no longer obviously the cheapest option.


Towards a Decision Framework

Here I propose a quick checklist of things to consider when deciding which way to go. You might want to score each one and to see which way you want to go.

Points for the library:

  • The domain is security-sensitive, complicated or difficult to correctly code
  • The library is mature and widely reviewed
  • The library is well-supported and likely to be maintained for a long time
  • The library has few transitive dependencies (and those dependencies are themselves solid projects)

Consider writing your own when:

  • You only need a small subset of a large library
  • The libraries’ used functions are simple and easily specified
  • The dependency brings many transitive dependencies
  • Approval and vulnerability-management costs are high
  • The code can be exhaustively tested by code you can maintain
  • The blast radius is low

If you feel this was useful, consider buying me a coffee to say thanks!

Fine-Tuning an AI – Part I

This post (Part I) walks you through a Git repo I put together to allow anyone to fine-tune a model to write emails in their own ‘voice’.

I did this to use a small model that I could run at home to help me draft emails that didn’t suck.

You don’t need a hefty GPU, but you will need a credit card to rent one on vast.ai for the training (note: affiliate link – I will benefit if you sign up and spend). It costs me about $1 per run to create a useful model that I can use on my home (non-GPU) server.

Part II will dive deeper into how the script works and what’s going on.

What It Does

At the very highest level, the script creates a new large language model (LLM), building from a base model, and training on sample emails you give it. Here’s some sample output (truncated to 40 tokens):

INPUT:
Write an email to a friend arranging a drink:
OUTPUT:
Hey, how are you? I was thinking of going out for a drink this evening. Are you free? Let me know what time and where we could meet.
Ian
On 10 Jul

You can see how it’s put my name (Ian) at the bottom and a (spurious) date, because that’s why a typical email looks like from me.

Compare this to the ‘base’ model this was built from (Qwen/Qwen2.5-3B-Instruct), which gives a far more ‘generic’ and formal response (also truncated to 40 tokens):

INPUT:
Write an email to a friend arranging a drink:
OUTPUT:
The date is 10/24/10 at 6:00pm.
Subject: Exciting Plans for Our Drink Date!

Dear [Friend's Name],

I hope

That, by contrast, does not sound like me at all.

…in a bit more detail

Here’s what the script (run_fine_tune.sh) does:

  • Picks the cheapest available GPU from vast.ai that fits the criteria needed, and provisions it
  • Copies up a script to run on the machine (build_and_push_model.sh), the gmail samples you want to train it on (compressed), and a huggingface token (to store the resulting models)
  • Runs build_and_push_model.sh on the GPU, which:
    • Installs the needed software (LLaMA-Factory, llama.cpp)
    • Logs into huggingface
    • Creates the huggingface repos (if needed)
    • Trains the new model
    • Merges the new model
    • Converts the new model to .gguf format (allowing it to be run in llama.cpp)
    • Quantizes the new model (so it can be run more easily on a non-GPU machine)
    • Pushes the merged model, and the .gguf, to huggingface repos
  • Destroys the vast.ai instance (asking you first)

How To Run It

The latest instructions will be maintained here:

  1. Clone the repo, and navigate to root folder
  2. Set up the virtual environment, install required pips
  3. Use Google Takeout to download your sent email to a .mbox file. You probably only need your ‘Sent’ emails, as they contain your responses. If you don’t use gmail, the script expects a .mbox file. Then run the python script to extract the emails and put them in the correct format. Then compress the result with xz
  4. Get a huggingface write token
  5. Set up vast.ai api key
  6. Run the script
    • Here’s an example invocation, with placeholders in caps:

./run_fine_tune.sh --max-steps 1 --hf-repo qwen2.5-3b-YOUR_NAME_sft --hf-user YOUR_HUGGING_FACE_ACCOUNT


If you like this, you might like one of my books:
Learn Bash the Hard Way

Learn Git the Hard Way
Learn Terraform the Hard Way


If you feel this was useful, consider buying me a coffee to say thanks!

I’m Writing a New Book!

I’m excited to announce that I’m writing a new book for O’Reilly:

Follow the Money: Understanding the Financial Architecture of Software.

This is a project that’s been quietly brewing for several years. 👇

The book’s primary goal is to help those engineers and managers whose efforts to change their organisations with technology hit the buffers as they scale. The direct application of sweat and resultant problem-solving worked well when you’re a more junior engineer, but as you try to make more fundamental and wide-ranging change seems to work less and less effectively. This book is intended to cast light on the invisible barriers that block your progress, and give you the tools to understand how to change them.

Its secondary goal is to be a more practical guide to making change within technical organisations, and even across wider businesses. While many books and blog posts exist on what the ideal state of various aspects of your organisation (culture, team structure, test practices, platform technologies etc) not enough of them give practical advice on how to get there, which is a far messier process. It might even help your career by allowing you to see how why – and just how challenging – it will be to get there, and perhaps consider setting yourself up for success elsewhere.

I hope the book will distil the experiences I’ve had over the past five years as a Could Native consultant with Container Solutions, and the last 30 years as an engineer, to give the reader a head start in navigating these murky waters.

The book will be replete with practical advice and real-world case studies to illustrate and provide inspiration (and caution) about what lies ahead for those that want to make positive change.

I hope you’ll join me on this journey. If these themes resonate, stay tuned. I’ll be sharing updates, excerpts, and insights here along the way. In the meantime, this video (‘How Money Flows Trump Conway’s Law) shares some of my more embryonic thoughts from a couple of years ago.

Your thoughts, questions, and experiences are always welcome!

Let’s follow the money together.