This post (Part I) walks you through a Git repo I put together to allow anyone to fine-tune a model to write emails in their own ‘voice’.
I did this to use a small model that I could run at home to help me draft emails that didn’t suck.
You don’t need a hefty GPU, but you will need a credit card to rent one on vast.ai for the training (note: affiliate link – I will benefit if you sign up and spend). It costs me about $1 per run to create a useful model that I can use on my home (non-GPU) server.
Part II will dive deeper into how the script works and what’s going on.
What It Does
At the very highest level, the script creates a new large language model (LLM), building from a base model, and training on sample emails you give it. Here’s some sample output (truncated to 40 tokens):
INPUT: Write an email to a friend arranging a drink: OUTPUT: Hey, how are you? I was thinking of going out for a drink this evening. Are you free? Let me know what time and where we could meet. Ian On 10 Jul
You can see how it’s put my name (Ian) at the bottom and a (spurious) date, because that’s why a typical email looks like from me.
Compare this to the ‘base’ model this was built from (Qwen/Qwen2.5-3B-Instruct), which gives a far more ‘generic’ and formal response (also truncated to 40 tokens):
INPUT: Write an email to a friend arranging a drink: OUTPUT: The date is 10/24/10 at 6:00pm. Subject: Exciting Plans for Our Drink Date!
Dear [Friend's Name],
I hope
That, by contrast, does not sound like me at all.
…in a bit more detail
Here’s what the script (run_fine_tune.sh) does:
Picks the cheapest available GPU from vast.ai that fits the criteria needed, and provisions it
Copies up a script to run on the machine (build_and_push_model.sh), the gmail samples you want to train it on (compressed), and a huggingface token (to store the resulting models)
Runs build_and_push_model.sh on the GPU, which:
Installs the needed software (LLaMA-Factory, llama.cpp)
Logs into huggingface
Creates the huggingface repos (if needed)
Trains the new model
Merges the new model
Converts the new model to .gguf format (allowing it to be run in llama.cpp)
Quantizes the new model (so it can be run more easily on a non-GPU machine)
Pushes the merged model, and the .gguf, to huggingface repos
Use Google Takeout to download your sent email to a .mbox file. You probably only need your ‘Sent’ emails, as they contain your responses. If you don’t use gmail, the script expects a .mbox file. Then run the python script to extract the emails and put them in the correct format. Then compress the result with xz
The book’s primary goal is to help those engineers and managers whose efforts to change their organisations with technology hit the buffers as they scale. The direct application of sweat and resultant problem-solving worked well when you’re a more junior engineer, but as you try to make more fundamental and wide-ranging change seems to work less and less effectively. This book is intended to cast light on the invisible barriers that block your progress, and give you the tools to understand how to change them.
Its secondary goal is to be a more practical guide to making change within technical organisations, and even across wider businesses. While many books and blog posts exist on what the ideal state of various aspects of your organisation (culture, team structure, test practices, platform technologies etc) not enough of them give practical advice on how to get there, which is a far messier process. It might even help your career by allowing you to see how why – and just how challenging – it will be to get there, and perhaps consider setting yourself up for success elsewhere.
I hope the book will distil the experiences I’ve had over the past five years as a Could Native consultant with Container Solutions, and the last 30 years as an engineer, to give the reader a head start in navigating these murky waters.
The book will be replete with practical advice and real-world case studies to illustrate and provide inspiration (and caution) about what lies ahead for those that want to make positive change.
I hope you’ll join me on this journey. If these themes resonate, stay tuned. I’ll be sharing updates, excerpts, and insights here along the way. In the meantime, this video (‘How Money Flows Trump Conway’s Law) shares some of my more embryonic thoughts from a couple of years ago.
Your thoughts, questions, and experiences are always welcome!
Writing A 1996 Essay Again in 2023, This Time With Lots More Transistors
ChatGPT 3
Gathering the Sources
PrivateGPT
Ollama (and Llama2:70b)
Hallucinations
What I Learned
TL;DR
I used private and public LLMs to answer an undergraduate essay question I spent a week working on nearly 30 years ago, in an effort to see how the experience would have changed in that time. There were two rules:
No peeking at the original essay, and
No reading any of the suggested material, except to find references.
The experience turned out to be radically different with AI assistance in some ways, and similar in others.
Although I work in software now, there was a time when I wanted to be a journalist. To that end I did a degree in History at Oxford. Going up in 1994, I entered an academic world before mobile phones, and with an Internet that was so nascent that the ‘computer room’ was still fighting for space with the snooker table for space next to the library (the snooker table was ejected a few years later, sadly). We didn’t even have phones in our dorms: you had to communicate with fellow students either in person, or via an internal ‘pigeon post’ system.
My typical week then was probably similar to a week of a History student in 1964: tutorials were generally on Friday, where I would be given the essay question and reading list for the following week:
A typical History essay reading list from 1996. My annotations at the top related to the library reference details, and the notes at the bottom (barely legible even to me now) I suspect were the notes I made during the hour as my tutor was telling me what I should have written about.
That afternoon I would go to several libraries, hoping that other students studying the same module hadn’t borrowed the books on the list. Over the following week I would try to read and make notes on the material as much as possible given my busy schedule of drinking incredibly cheap and nasty red wine, feeling depressed, and suffering from the endless colds living in Oxford (which is built on a bog) gifts you free of charge.
The actual writing of the essay would take place on the day before delivery, and involve a lot of procrastination (solitaire and minesweeper on my Compaq laptop, as I recall) while the painful process of squeezing the juices from my notes took place.
I didn’t go to lectures. They were way too early in the afternoon for me to get out of bed for.
Writing A 1996 Essay Again in 2023, This Time With Lots More Transistors
The essay question I chose to answer was from a General British History (19th-20th century) course:
How far was the New Liberalism a Departure from the Old?
ChatGPT 3
I first went to ChatGPT (3, the free version – I’m a poor student after all) to try and familiarise myself with the topic:
Already I was way ahead of my younger self. I had an idea of what the broad strokes I needed to hit were to get a passable essay together. ChatGPT is great for this kind of surveying or summarizing stuff, but is pretty bad at writing an undergraduate-level essay for you. It can write a passable essay (literally), but it will be very obvious it was written by an LLM, and probably at best get you a 2:2 or a third, due to the broad-brush tone and supine mealy-mouthed ‘judgements’ which contain very little content.
After a few more interactions with ChatGPT over the next few minutes:
I was now ready to gather the information I needed to write the essay.
Gathering the Sources
The first step was to get the materials. It took me a couple of hours to find all the texts (through various, uh, avenues) and process them into plain text. I couldn’t get the journal articles (they were academically paywalled), which was a shame, as they tended to be the best source of figuring out what academic debate was behind the question.
Those couple of hours would have been reduced significantly for any future essay I might choose to write, as I spent some time figuring out exactly where to get them and installing software to extract the plain text. So I would now allocate an hour to getting the vast majority of texts, as opposed to the small subset of texts I would hunt down over a significantly longer time in the physical libraries 30 years ago.
PrivateGPT
One of the well-known limitations with ChatGPT is that it doesn’t tell you what the relevant sources are that it looked at to generate the text it gives you.
I found out that this repo gives us a quick way to augment an LLM with a set of documents, so I installed it on an old Dell Xeon desktop with 24 cores and 128G of RAM. This machine cost me about £600 a couple of years ago, and still works a treat (I bought it to run Kubernetes clusters). It doesn’t have a graphics card in it, so I thought it would be usable for this kind of work, but it’s just fine with ‘normal’ CPUs (as long as you don’t demand instant responses).
$ python3 ingest.py Appending to existing vectorstore at db Using embedded DuckDB with persistence: data will be stored in: db Loading documents from source_documents Loading new documents: 100%|███████████████████| 12/12 [00:00<00:00, 274.84it/s] Loaded 12 new documents from source_documents Split into 38046 chunks of text (max. 350 tokens each) Creating embeddings. May take some minutes… Ingestion complete! You can now run privateGPT.py to query your documents
Ingestion of the 10M sources took about five minutes. I asked it a few questions, and this was a typical output:
> Question: what was the changing role of the state in new versus old liberalism
> Answer (took 169.94 s.): The main difference between "new" and "old" liberalism lies in their approach towards the role of the government within society. While traditional or 'Old' liberals believed that minimizing intervention by the State would lead to a more prosperous economy, ‘New’ Liberals acknowledged the need for state action and social reform as necessary measures against industrial poverty. In essence, new liberalism marked an evolution of thought among liberals towards greater acceptance of government responsibility in promoting general welfare through interventionist policies.
> source_documents/the_age_of_lloyd_george_kenneth_o_morgan.txt: It was, therefore, the Old Liberalism which helped to repair the fortunes of the party—nonconformity and industrial free trade, the very essence of the Liberal faith for half a century. Where, however, was the ‘New Liberalism’ which could embrace a more positive attitude towards the central government, for which many
> source_documents/the_new_liberalism_weiler.txt: In the broadest sense, the term new Liberalism refers both to the welfare legislation of the 1906-14 administration and to the changes in Liberal social and political theory. At the time, however, the term first referred to the theoretical changes which preceded the legislative achievements of the Liberal government.
> source_documents/the_new_liberalism_weiler.txt: What then was the new Liberalism? As defined here, it was the modification of Liberal ideology from a vehicle for a mid-Victorian ideal of laissez-faire to a philosophy based on state action and social reform to cope with industrial poverty. This modification was achieved by Liberal journalists and theorists seeking to
> source_documents/the_new_liberalism_weiler.txt: state, a development that represents the fruition of the new Liberalism.
New Liberalism The Political Economy of J.A. Hobson
UNIVERSITY OF TORONTO PRESS Toronto Buffalo London
Toronto Buffalo London Printed in Canada ISBN 0-8020-5558-3
> source_documents/the_new_liberalism_weiler.txt: understanding of the phenomenon of the new Liberalism itself.
> source_documents/the_new_liberalism_freeden.txt: THE SOCIAL POLICY OL THE NEW LIBERALISM
You can see that the output is mixed in quality (the model I used was relatively small, see Ollama section below), but it did point me at potentially relevant sources for quotes and bits to read. I spent about another hour rooting around these texts and gathering bits of text that I might use later. Note the response time, however (about three minutes is typical). I got a bit frustrated with the response time vs quality and went looking for other tech to use.
Ollama (and Llama2:70b)
This was the combination of technology that blew my mind. It absolutely changes the game for this kind of work. Ollama is a project designed to ‘get up and running with large language models locally’ which borrows concepts from Docker (like the Modelfile (similar to the Dockerfile, and ollama get/pull/list)
Compared to the ChatGPT output, Llama2:70b (with a Llama2 model with 70b parameters) allowed me to automate a significant chunk of the work needed to write the essay, reducing the time needed to write the essay from days to less than a day.
I downloaded the biggest model I could find (Llama2:70b) and asked it the question in a very unspecific way. It returned a much better essay plan than ChatGPT did:
At this point I started writing the body of the essay (introductions and conclusions are best left to the end), basing each paragraph on the numbered points above. As I wrote, looked at specific texts to find references and asked LLMs questions, something unexpected happened: I was learning about the nuances of the debate (mainly through looking the Collini book I hadn’t been able to find in the 90s).
I was getting sucked into a temptation to actually read the texts, and had to remind myself that I wasn’t allowed to do that, according to the rules of the ‘game’ I was playing here.
This appreciation of which texts were important, and what I was being invited to write about happened much faster than would have been the case pre-Internet. I really do wonder how students do this today (get in touch if you can tell me!).
Hallucinations
A well-known limitation of LLMs is their tendency to ‘hallucinate’, or ‘bullshit’ a plausible-seeming answer. I definitely saw this, and it wasted some time for me. Llama2 gave me this nice quote, which raised suspicions with me as it seemed too good to be true:
Evidence: In his book "Industrial Democracy" (1902), Webb argues that collective ownership and control of the means of production is necessary to address the systemic issues that perpetuate poverty and inequality. He writes, "The only way to secure industrial democracy is through the collective ownership and control of the means of production."
But I couldn’t find that quote anywhere in the book. Llama2 just made it up.
What I Learned
After writing my AI-assisted essay, I took a look at my original essay from 1996. To my surprise, the older one was far longer than I remembered essays being (about 2500 words to my AI-assisted one’s 1300), and was, it seemed to me, much higher quality than my AI-assisted one. It may have been longer because it was written in my final year, where I was probably at the peak of my essay-writing skills.
It took me about six hours of work (spread over 4 days) to write the AI-assisted essay. The original essay would have taken me a whole week, but the actual number of hours spent actively researching and writing it would have been at minimum 20, and at most 30 hours.
I believe I could have easily doubled the length of my essay to match my old one without reducing the quality of its content. But without actually reading the texts I don’t think I would have improved from there.
I learned that:
There’s still no substitute for hard study. My self-imposed rule that I couldn’t read the texts limited the quality to a certain point that couldn’t be made up for with AI.
History students using AI should be much more productive than they were in my day! I wonder whether essays are far longer now than they used to be.
Private LLMs (I used llama2:70b) can be way more useful than ChatGPT3.5 for this type of work, not only in the quality of generated response, but also the capability to identify relevant passages of text.
I might have reduced the time significantly with some more work to combine the llama2:70b model with the reference-generating code I had. More research is needed here.
Hope you enjoyed my trip down memory lane. The world of humanities study is not one I’m in any more, but it must be being changed radically by LLMs, just as it must have been by the internet. If you’re in that world now, and want to update me on how you’re using AI, get in touch.
The git repo with essay and various interactions with LLMs is here.