Five Things I Wish I’d Known About Git

Git can be utterly bewildering to someone who uses it casually, or is not interested in things like directed acyclic graphs.

For such users, the best thing you can do is buy my book (free sample available), which guides you through the usage of git in a practical way that embeds the concepts ready for daily use.

The second best thing you can do is read on. Here I briefly go through five things I wish someone had explained to me before I started using git.

1) The Four Stages

Having come from using CVS as a source control (an older example of a Version Control System (VCS)), one of the most baffling things about git was its different approach to the state of content.

CVS had two states of data:

  • uncommitted
  • committed

and this results in these kinds of workflows:

traditional_vcs

Whereas git has four states:

  • Local changes
  • Staged/added changes
  • Committed
  • Pushed to remote

Here’s a diagram that illustrates the four stages:

1.1.3.mermaid

If, like me, you use git commit -am "checkin message" to commit your work, then the second ‘adding/staging’ state is more or less invisible to you, since the -a does it for you. It’s for this reason that I encourage new users to drop the -a flag and git add by hand, so that they understand these distinctions.

One subtlety is that the -a flag doesn’t add new files to the content tracked by git – it just adds changes made.

These states exist so that people can work independently and offline, syncing later. This was the driving force behind the development of git.

From this comes another key point: all git repositories are created equal. My clone of your repository is not dependent on yours for its existence. Each repository stands on its own, and is only related to others if you configure it so. This is another key difference between git and more traditional (nay, obsolete) client/server models of content history management.

This results in a workflow that looks more like this:

distributed_vcs

which is a far more flexible (and potentially more complicated) workflow.

2) What is a Reference?

Git docs and blogs keep talking about references, but what is a reference?

A reference is just this: a pointer to a commit. And a commit is a unique reference to a new state of the content.

Once this is understood, a few other concepts make more sense.

HEAD is a reference to ‘where you are’ in the content history. It’s the content you’re currently looking at in your git repo.

When you git commit, the HEAD moves to the new commit.

A git tag reference is one that can have arbitrary text, and does not move when a new commit is seen.

A git branch is a reference that moves with the HEAD whenever you commit a new change.

A couple of other confusing things then become clearer. For example, a detached HEAD is nothing to panic about despite its scary name – it just means that your HEAD is not pointed at a branch.

To help cement the above, look at this diagram:

1.5.4.tex

It represents a series of commits.

Confusingly, with git diagrams, the arrows go backwards in time. A is the first commit, then B, and so on to the latest commit (H).

There are three references – master (which is pointed at C), experimental, which is pointed at H, and HEAD, which is also pointed at H. HEAD, remember is ‘where we are’.

3) What’s a Fast-Forward?

Now that you understand what a HEAD reference is, understanding what a fast-forward is pretty simple.

Usually, when you merge two branches together, you get a new commit:1.5.2.tex

In the above diagram, I is a commit that represents the merging of H and G from its common ancestor (D). The changes made on both branches are applied together from D and the resulting state of the content after the commit is stored in a new state (I).

But consider the diagram we saw above:

1.5.4.tex

There we have two branches, but no changes were made on one of them. Let’s say we want to merge the changes on experimental (E and H) into master – we’ve experimented, and the experiment was successful.

In this case, merging E and H into master requires no changes from H, since there’s no F and G changes that need to be merged together with E and H. They are all in one line of changes.

Such a merge only requires that the master reference is picked up and moved from C to H. This is a ‘fast-forward’ – the reference just needed moving along, and no content needed to be reconciled.

4) What’s a Rebase?

My manual page for git rebase says:

Reapply commits on top of another base tip

this is much more comprehensible than previous versions of this man page, but will still confuse many people.

A visual example makes it much clearer.

Consider this example:

2.5.3.tex

You could merge feature1 into the master branch, and you’d end up with a new commit (G), which makes the tree look like this:

2.5.4.tex

You can see that you’ve retained the chronology, as both branches keep their history and order of commits.

A git rebasetakes a different approach. It ‘picks up’ the changes on our branch (commit D on feature1 in this case) and applies it to the end of the branch we are on (HEAD is at master).

2.5.5.tex

It’s as though we just checked out master and then made a change (D) on a new branch (feature1), rather than branched off from master some time ago at C and did our feature1 work there.

This looks a lot neater, doesn’t it? master can now be ‘fast-forwarded’ to where feature1 is by moving master‘s pointer along to D.

The downside is that we’ve lost something from the history by doing this. It doesn’t reflect the order things happened in anymore chronologically. Do you care about this?

5) The power of git log

The above concepts are all very well, but how do you grasp these in the course of your day-to-day work?

For this I highly recommend getting to grips with git’s native log command. While there are many GUIs that can display history, they all have their own opinions on how things should be displayed, and moreover are not available everywhere. As a source of truth, git log is unimpeachable and transparent.

I wrote about this in more depth here, but to give yourself a flavour, try these two commands on a repo of your choice. They cover 90% of my git log usage day-to-day:

$ git log --oneline --graph

$ git log --oneline --graph --simplify-by-decoration --all

 


Concepts explained here are taught in my book Learn Git the Hard Way.

learngitthehardway


 

 

 

6 thoughts on “Five Things I Wish I’d Known About Git

  1. “The changes made on both branches are applied together from D and the resulting state of the”

    ….the ….. the …. ? I’m on the edge of my seat!

  2. What you are actually rewriting when rebasing is not the order of commits, but the commits/states themselves.
    This causes quite a few potential problems. First and foremost you lose any guarantee that the changes induce the same effect as originally observed and intended. Earlier commits could potentially influence the effects of the changes introduced by the commit. In a merge, the state is conserved, so this is not an issue.
    Merge works a lot better with things like submodules etc. as commits and commit IDs stay the same.
    As a matter of personal preference also structures commits into topic(-branche)s

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.