Git can be utterly bewildering to someone who uses it casually, or is not interested in things like directed acyclic graphs.
For such users, the best thing you can do is buy my book (free sample available), which guides you through the usage of git in a practical way that embeds the concepts ready for daily use.
The second best thing you can do is read on. Here I briefly go through five things I wish someone had explained to me before I started using git.
1) The Four Stages
Having come from using CVS as a source control (an older example of a Version Control System (VCS)), one of the most baffling things about git was its different approach to the state of content.
CVS had two states of data:
and this results in these kinds of workflows:
Whereas git has four states:
- Local changes
- Staged/added changes
- Pushed to remote
Here’s a diagram that illustrates the four stages:
If, like me, you use
git commit -am "checkin message" to commit your work, then the second ‘adding/staging’ state is more or less invisible to you, since the
-a does it for you. It’s for this reason that I encourage new users to drop the
-a flag and
git add by hand, so that they understand these distinctions.
One subtlety is that the
-a flag doesn’t add new files to the content tracked by git – it just adds changes made.
These states exist so that people can work independently and offline, syncing later. This was the driving force behind the development of git.
From this comes another key point: all git repositories are created equal. My clone of your repository is not dependent on yours for its existence. Each repository stands on its own, and is only related to others if you configure it so. This is another key difference between git and more traditional (nay, obsolete) client/server models of content history management.
This results in a workflow that looks more like this:
which is a far more flexible (and potentially more complicated) workflow.
2) What is a Reference?
Git docs and blogs keep talking about references, but what is a reference?
A reference is just this: a pointer to a commit. And a commit is a unique reference to a new state of the content.
Once this is understood, a few other concepts make more sense.
HEAD is a reference to ‘where you are’ in the content history. It’s the content you’re currently looking at in your git repo.
git commit, the
HEAD moves to the new commit.
git tag reference is one that can have arbitrary text, and does not move when a new commit is seen.
git branch is a reference that moves with the
HEAD whenever you commit a new change.
A couple of other confusing things then become clearer. For example, a
detached HEAD is nothing to panic about despite its scary name – it just means that your
HEAD is not pointed at a branch.
To help cement the above, look at this diagram:
It represents a series of commits.
Confusingly, with git diagrams, the arrows go backwards in time.
A is the first
B, and so on to the latest commit (
There are three
master (which is pointed at
experimental, which is pointed at
HEAD, which is also pointed at
HEAD, remember is ‘where we are’.
3) What’s a Fast-Forward?
Now that you understand what a
HEAD reference is, understanding what a fast-forward is pretty simple.
Usually, when you merge two branches together, you get a new commit:
In the above diagram,
I is a commit that represents the merging of
G from its common ancestor (
D). The changes made on both branches are applied together from
D and the resulting state of the content after the commit is stored in a new state (
But consider the diagram we saw above:
There we have two branches, but no changes were made on one of them. Let’s say we want to merge the changes on
master – we’ve experimented, and the experiment was successful.
In this case, merging
master requires no changes from
H, since there’s no
G changes that need to be merged together with
H. They are all in one line of changes.
Such a merge only requires that the
master reference is picked up and moved from
H. This is a ‘fast-forward’ – the reference just needed moving along, and no content needed to be reconciled.
4) What’s a Rebase?
My manual page for
git rebase says:
Reapply commits on top of another base tip
this is much more comprehensible than previous versions of this man page, but will still confuse many people.
A visual example makes it much clearer.
Consider this example:
You could merge
feature1 into the
master branch, and you’d end up with a new commit (
G), which makes the tree look like this:
You can see that you’ve retained the chronology, as both branches keep their history and order of commits.
git rebasetakes a different approach. It ‘picks up’ the changes on our branch (commit
feature1 in this case) and applies it to the end of the branch we are on (
HEAD is at
It’s as though we just checked out
master and then made a change (
D) on a new branch (
feature1), rather than branched off from
master some time ago at
C and did our
feature1 work there.
This looks a lot neater, doesn’t it?
master can now be ‘fast-forwarded’ to where
feature1 is by moving
master‘s pointer along to
The downside is that we’ve lost something from the history by doing this. It doesn’t reflect the order things happened in anymore chronologically. Do you care about this?
5) The power of git log
The above concepts are all very well, but how do you grasp these in the course of your day-to-day work?
For this I highly recommend getting to grips with git’s native
log command. While there are many GUIs that can display history, they all have their own opinions on how things should be displayed, and moreover are not available everywhere. As a source of truth,
git log is unimpeachable and transparent.
I wrote about this in more depth here, but to give yourself a flavour, try these two commands on a repo of your choice. They cover 90% of my
git log usage day-to-day:
$ git log --oneline --graph
$ git log --oneline --graph --simplify-by-decoration --all
Concepts explained here are taught in my book Learn Git the Hard Way.
6 thoughts on “Five Things I Wish I’d Known About Git”
“The changes made on both branches are applied together from D and the resulting state of the”
….the ….. the …. ? I’m on the edge of my seat!
What you are actually rewriting when rebasing is not the order of commits, but the commits/states themselves.
This causes quite a few potential problems. First and foremost you lose any guarantee that the changes induce the same effect as originally observed and intended. Earlier commits could potentially influence the effects of the changes introduced by the commit. In a merge, the state is conserved, so this is not an issue.
Merge works a lot better with things like submodules etc. as commits and commit IDs stay the same.
As a matter of personal preference also structures commits into topic(-branche)s