Beyond ‘Punk Rock Git’ in Eleven Steps

Punk Rock Git?

I’ve spent the last couple of years teaching Git, mostly to users that I like to call ‘punk rock’ Git users.

They have three commands, and they can get by with them.

They’re not terribly interested in Merkle trees or SHA-1 hashes, but are interested in what a ‘detached HEAD‘ is, or understanding what a rebase is so they don’t feel intimidated when someone drops it into a conversation.

In fact, a stated primary aim of the course I wrote and run is to get you to fully understand what a rebase is. At that point, you’re a punk rocker no longer.

These git students often say they are bewildered by where to start with expanding their git knowledge. So here, I briefly outline a set of Git concepts and commands, and – crucially – the order you should grasp them in.

I also noted in bold most of the ‘a-ha’ moments I’ve observed experienced but casual users getting when they follow the course.

My book ‘Learn Git the Hard Way’ (which is the basis
of my course and has a similar structure,  and with more
optional items covered) is available as a book here.


 

1) Core Git Concepts

Here’s the baseline of what you want to know before you set off:

  • Understand that there are four distinct phases to git content
    • Local changes
    • Staging area
    • Committed
    • Pushed
  • Don’t memorise the above, just be aware of it
  • Branching is cheap compared to other source control tools
  • All repositories are equivalent in status – there’s no predefined client/server relationship
  • GitHub is treated as a server by convention by many users – but there’s nothing about Git that forces anything to be a ‘server’. GitHub could be used as a backup

 

2) Creating a Repository

It’s important to see what a git repository created from scratch looks like:

  • Creating a git repository is as simple as running git init in a folder
  • Use git add to add content
  • Use git status to see what’s going on
  • Use git commit to commit some content
  • Use git log to see what the history now says
  • Have a look at the .git folder and understand the what the HEAD file is doing

 

3) Cloning Repositories

You’ve created a repository, now see what happens when you clone it:

  • Use git clone to clone the repository you created above
  • Look at the .git folder and figure out its relationship to the cloned repository
  • Delete stuff ‘accidentally’ from your clone and restore using git reset

 

4) Branching and Tagging

Now create a branch:

  • Create a branch in a repo and make sure you’ve moved to it
  • Understand that a branch is ‘just’ a pointer to a commit that moves with each commit
  • Understand what HEAD is doing
  • Create a tag
  • Understand that a tag is a pointer to a commit
  • Understand that HEAD, branches and tags are all references

 

5) Merging

You’ve got branches, so learn how to merge them:

  • Create two conflicting changes on two branches
  • Try to git merge one with another
  • Resolve the conflict and git commit
  • Understand the diff format
    • There’s a great exposition here

 

6) Stashing

If you’re going to branch, you’ll probably want to stash:

  • Know why stashing is important/why it’s used
  • Use git stash and git stash pop
  • Understand that it’s in the git log --all output, and that it’s ‘just’ a branch, albeit a special one

 

7) The Reflog

Now you know what references are, and have seen the stash, the reflog should make sense quickly:

  • Use the git reflog command
  • Understand that the reflog (reference log) logs
  • Understand that this comes in handy when things go wrong in git, and references have been moved around
  • Use git reset to revert a change in the reflog using the reference id

8) Cherry Picking

Cherry picking is a nice precursor to rebasing, as they’re (in principle) similar ideas:

  • Make a change on one of the branches you’ve created
  • git checkout a change in another branch on the same repository
  • Use git log to get the commit ID of the change (which looks similar to

    05bef161ec563f5a0b3886f2f35a6cab37b06389

  • Use git cherry-pick and the ID you just found to port the same change to the other branch

9) Rebasing

Now you’re ready for rebasing!

All through the below you might want to use git log --all --graph --oneline to see the state of the repository and what’s changing:

  • Create a new branch (B) from an existing branch (A)
  • Move to that new branch (​B)
  • Create a series of changes on B
  • Move to the old branch (A)
  • Create a series of changes on A
  • Go back to branch B
  • Use git rebase to update your branch so that its changes now come from the updated end of the A branch

See how all the changes are in a line now?

  • Go back to the A branch and rebase it to B. What happened?

10) Remotes

  • In your cloned repo, look at the output of git remote -v
  • Run git branch -a and see how the remote repository is actually stored in your current repository with a remote reference.
  • Look at the .git/config file and see how the cloned repository is referenced within this repository

11) Pulling

Now you’ve understood ​remotes, you can now deconstruct git pull and understand what it does:

  • Stop using git pull!
  • Read about, use and learn git fetch and git merge
  • Understand that when you fetch, you’re talking to a different repo, and then by merging it in you’ve done a ‘pull’
  • For bonus points, grok that fast-forwards are just the HEAD moving its pointer along to the end of a series of changes without a merge

 

‘You Missed X!’

There’s plenty more that others might consider essential, eg:

  • Submodules
  • Bare repos
  • Advanced git logging

and so on. But the above is what I’ve found is required to get beyond punk rock git.

 

Related posts

 


You might also be interested in my book Learn Git the Hard Way, which goes into these concepts and others in a more guided way:

learngitthehardway

 

 

 

 

 

 

 

 


You might also be interested in Learn Bash the Hard Way or Docker in Practice

hero

 

 

Get 39% off Docker in Practice with the code: 39miell2

 

Advertisements

Sandboxing Docker with Google’s gVisor

 

gVisor

Someone pointed me at this press release from Google announcing a Docker / container sandbox for Linux.

Layers

I was intrigued enough to write a ‘quick look’ article on it here

What Does That Mean (tl;dr)?

It’s a way of achieving:

  • VM-like isolation while
  • using containers for app deployment and achieving
  • multi-tenancy, and
  • SELinux/Apparmor/Seccomp security control

What Does That Mean (Longer)?

There’s quite a few ways to limit the access a container has to the OS API. They’re listed and discussed on the gVisor Github page.

It explains what it does to the container:

gVisor intercepts all system calls made by the application, and does the necessary work to service them.

At first I thought gVisor was a ‘just’ syscall intermediary that could filter Linux API calls, but the sandboxing goes further than that:

Importantly, gVisor does not simply redirect application system calls through to the host kernel. Instead, gVisor implements most kernel primitives (signals, file systems, futexes, pipes, mm, etc.) and has complete system call handlers built on top of these primitives.

From a security perspective, this is a key quote as well:

Since gVisor is itself a user-space application, it will make some host system calls to support its operation, but much like a VMM, it will not allow the application to directly control the system calls it makes.

What really made my jaw drop was this:

[gVisor’s] Sentry [process] implements its own network stack (also written in Go) called netstack.

So it even goes to the length of not touching the host’s network stack. This reduces the attack surface of a malign container significantly.

From reading this it seems to implement most of an OS in userspace, only going to the host OS when necessary and allowed.

This is in contrast to tools like SELinux and AppArmor, which rely on host Kernel features and a bunch of root-defined constraining rules to ensure nothing bad happens.

Rule-Based-Execution (1)

SELinux is a fantastic technology that should be more used, but the reality is that it’s very hard for people to write and understand policies and feel comfortable with it since it’s so embedded in the kernel.

An alternative to achieve a similar thing might be to just use a VM:

Machine-Virtualization

But as the number of boxes above indicates, that’s a relatively heavy overhead to achieve isolation.

gVisor gives you the lightweight benefits of containers and the control of VMM and host-based kernel filters.

Layers

A Closer Look

I wrote a ShutIt script to create a re-usable Ubuntu VM in which I could build gVisor and run up sandboxed containers.

The build part of the script is here and the instructions to reproduce are here. If you get stuck contact me.

Here’s a video of the whole thing setting up in a VM using the above script:

Architecture

gVisor is a go binary that creates a runtime environment for the container instead of runc. It consists of two processes:

In order to provide defense-in-depth and limit the host system surface, the gVisor container runtime is normally split into two separate processes. First, the Sentry process includes the kernel and is responsible for executing user code and handling system calls. Second, file system operations that extend beyond the sandbox (not internal proc or tmp files, pipes, etc.) are sent to a proxy, called a Gofer, via a 9P connection.

I didn’t know what a 9P connection is. I assume it’s something to do with the Plan9 OS, but that’s just a guess.


You might also like these posts:

Docker Security Validation
A Field Guide to Docker Security Measures
SELinux Experimentation with Reduced Pain
Unprivileged Docker Builds – A Proof of Concept


If you set the Docker daemon up according to the docs, you get a set of debug files in /tmp/runsc:

-rw-r--r-- 1 root root 6435 May 5 07:53 runsc.log.20180505-075350.302600.create
-rw-r--r-- 1 root root 1862 May 5 07:53 runsc.log.20180505-075350.337120.state
-rw-r--r-- 1 root root 3180 May 5 07:53 runsc.log.20180505-075350.346384.start
-rw-r--r-- 1 root root 1862 May 5 07:53 runsc.log.20180505-075350.529798.state
-rw-r--r-- 1 root root 32705613 May 5 08:22 runsc.log.20180505-075350.312537.gofer
-rw-r--r-- 1 root root 226843210 May 5 08:22 runsc.log.20180505-075350.319600.boot
-rw-r--r-- 1 root root 1639 May 5 08:22 runsc.log.20180505-082250.158154.kill
-rw-r--r-- 1 root root 1858 May 5 08:22 runsc.log.20180505-082250.210046.state
-rw-r--r-- 1 root root 1639 May 5 08:22 runsc.log.20180505-082250.221802.kill
-rw-r--r-- 1 root root 1600 May 5 08:22 runsc.log.20180505-082250.233557.delete

The interesting ones appear to be .gofer (which records calls made to the OS). When I noodled around, these mostly appeared to be requests to write to the docker filesystem on the host (which needs to happen when you write in the container):

D0505 07:53:50.516882 10831 x:0] Open reusing control file, mode: ReadOnly, "/var/lib/docker/overlay2/a8eadcb9a8427fa170e485f72d5aee6ee85a9c7b9176a6f01a6965f2bcd7e219/merged/bin/bash"
D0505 07:53:50.516907 10831 x:0] send [FD 3] [Tag 000001] Rlopen{QID: QID{Type: 0, Version: 0, Path: 541783}, IoUnit: 0, File: &{{38}}}

or files

D0505 07:53:50.518729 10831 x:0] send [FD 3] [Tag 000001] Rreadlink{Target: /lib/x86_64-linux-gnu/ld-2.27.so}
D0505 07:53:50.518869 10831 x:0] recv [FD 3] [Tag 000001] Twalkgetattr{FID: 1, NewFID: 15, Names: [lib]}
D0505 07:53:50.518927 10831 x:0] send [FD 3] [Tag 000001] Rwalkgetattr{Valid: AttrMask{with: Mode NLink UID GID RDev ATime MTime CTime Size Blocks}, Attr: Attr{Mode: 0o40755, UID: 0, GID: 0, NLink: 8, RDev: 0, Size: 4096, BlockSize: 4096, Blocks: 8, ATime: {Sec: 1525506464, NanoSec: 357515221}, MTime: {Sec: 1524777373, NanoSec: 0}, CTime: {Sec: 1525506464, NanoSec: 345515221}, BTime: {Sec: 0, NanoSec: 0}, Gen: 0, DataVersion: 0}, QIDs: [QID{Type: 128, Version: 0, Path: 542038}]}

The .boot file is the strace log from the container, which combined with the .gofer log can tell you what’s going on in and out of the container’s userspace.

Matching the above time of the opening of the file up I see this in the .boot log:

D0505 07:53:50.518797 10835 x:0] recv [FD 4] [Tag 000001] Rreadlink{Target: /lib/x86_64-linux-gnu/ld-2.27.so}
D0505 07:53:50.518824 10835 x:0] send [FD 4] [Tag 000001] Twalkgetattr{FID: 1, NewFID: 15, Names: [lib]}
D0505 07:53:50.519041 10835 x:0] recv [FD 4] [Tag 000001] Rwalkgetattr{Valid: AttrMask{with: Mode NLink UID GID RDev ATime MTime CTime Size Blocks}, Attr: Attr{Mode: 0o40755, UID: 0, GID: 0, NLink: 8, RDev: 0, Size: 4096, BlockSize: 4096, Blocks: 8, ATime: {Sec: 1525506464, NanoSec: 357515221}, MTime: {Sec: 1524777373, NanoSec: 0}, CTime: {Sec: 1525506464, NanoSec: 345515221}, BTime: {Sec: 0, NanoSec: 0}, Gen: 0, DataVersion: 0}, QIDs: [QID{Type: 128, Version: 0, Path: 542038}]}

Boot also has intriguing stuff like this in it:

D0505 07:53:50.514800 10835 x:0] urpc: unmarshal success.
W0505 07:53:50.514848 10835 x:0] *** SECCOMP WARNING: console is enabled: syscall filters less restrictive!
I0505 07:53:50.514881 10835 x:0] Installing seccomp filters for 63 syscalls (kill=false)
I0505 07:53:50.514890 10835 x:0] syscall filter: 0
I0505 07:53:50.514901 10835 x:0] syscall filter: 1
I0505 07:53:50.514916 10835 x:0] syscall filter: 3
I0505 07:53:50.514928 10835 x:0] syscall filter: 5
I0505 07:53:50.514952 10835 x:0] syscall filter: 7

Still Very Beta

I had lots of problems doing basic things in this sandbox, so believe them when they say this is a work in progress.

For example, I ran an apt install and got this error:

E: Can not write log (Is /dev/pts mounted?) - posix_openpt (2: No such file or directory)

which I’d never seen before.

Also, when I pinged:

root@a2e899f2e8af:/# ping google.com
root@a2e899f2e8af:/# ping bbc.co.uk
root@a2e899f2e8af:/# 

It returned immediately but I got no output at all.

I also saw errors with simple commands when apt installing. Running these commands by hand, I got some kind of race condition that couldn’t be escaped:

root@59dc5700406d:/# /usr/sbin/groupadd -g 101 systemd-journal
groupadd: /etc/group.604: lock file already used
groupadd: cannot lock /etc/group; try again later.

 


 

 

Unprivileged Docker Builds – A Proof of Concept

I work at a very ‘locked-down’ enterprise, where direct access to Docker is effectively verboten.

This, fundamentally, is because access to Docker is effectively giving users root. From Docker’s own pages:

First of all, only trusted users should be allowed to control your Docker daemon.

Most home users get permissions in their account (at least in Linux) by adding themselves to the docker group, which may as well be root. In Mac, installing Docker also gives you root-like power if you know what you’re doing.

Platform Proxies

Many Docker platforms (like OpenShift) work around this by putting an API between the user and the Docker socket.

However, for untrusted users this creates a potentially painful dev experience that contrasts badly with their experience at home:

  • Push change to git repo
  • Wait for OpenShift to detect the change
  • Wait for OpenShift to pull the repo
  • Wait for OpenShift to build the image
    • The last step can take a long time if the build is not cached – which can easily happen when you have lots of build nodes and you miss the cache

vs ‘at home’

  • Hit build on your local machine
  • See if the build works

What we really want is the capability to build images when you are not root, or privileged in any way.

Thinking about it, you don’t need privileges to create a Docker image. It’s just a bunch of files in a tar file conforming to a spec. But constructing one that conforms to spec is harder than simply building a tar, as you need root-like privileges to do most useful things, like installing rpms or apt packages.

Kaniko?

I was excited when I saw Google announced kaniko, which claimed:

…we’re excited to introduce kaniko, an open-source tool for building container images from a Dockerfile even without privileged root access.

and

Since it doesn’t require any special privileges or permissions, you can run kaniko in a standard Kubernetes cluster, Google Kubernetes Engine, or in any environment that can’t have access to privileges or a Docker daemon.

I took that to mean it could take a Dockerfile and produce a tar file with the image as a non-privileged user on a standard VM. Don’t you?

I also got lots of pings from people across my org asking when they could have it, which meant I had to take time out to look at it.

Kanikant

Pretty quickly I discovered that kaniko does nothing of the sort. You still need access to Docker to build it (which was a WTF moment). Even if you --force it not to (which you are told not to), you still can’t do anything useful without root.

googledisappointed

I’m still not sure why Kaniko exists as a ‘new’ technology when OpenShift already allows users to build images in a controlled way.

Rootless Containers

After complaining about it on GitHub I got some good advice.

There’s a really useful page here that outlines the state of the art in rootless containers for building, shipping and running.

It’s not for the faint hearted as the range of the technology required is somewhat bewildering, but there’s an enthusiastic mini community of people all trying to make this happen.

A Proof of Concept Build

I managed to get a build of a simple yum install on a centos:7 base image built as a completely unprivileged user using Vagrant and ShutIt to automate the build.

It shows the build of this Dockerfile as an unprivileged user (person) who does not have access to the docker socket (Docker does not even need to be installed for the run – it’s only used to build the proot binary, which could be done elsewhere):

FROM centos:7
RUN yum install -y httpd
CMD echo Hello host && sleep infinity

A video of this is available here and the code for this reproducible build is here. The interesting stuff is here.

 

Technologies Involved

I can’t claim deep knowledge of the technologies here (so please correct me where I’m wrong/incomplete), but here’s a quick run-down of the technologies used to achieve a useful rootless build.

  • runC

This allows us to run containers that conform to the OCI spec. Docker (for example) is a superset of the OCI spec of containers.

Although we use ‘runrootless’ to run the containers, runc is still needed to back it (I think – at least I had to install it to get this to work).

  • skopeo

Skopeo gives us the ability to take an OCI image and turn it into a Docker one later. It’s also used by orca-build (below) to copy images around while it’s building from Dockerfiles.

  • umoci

umoci modifies container images. It’s also used by orca-build to unpack and repack the image created at each stage. orca-build will

  • orca-build

orca-build is a wrapper around runC, and has some support for rootless builds. It uses these technologies:

It also takes Dockerfiles as input (with a subset of Docker commands).

  • runrootless

runrootless allows us to run OCI containers as part of each stage of the build, and in turn uses proot to allow root-like commands.

  • proot

proot allows you to create a root filesystem (such as is in a Docker container) without having root privileges. I suspected that proot was root by the back door using a setuid flag, but it appears not to be the case (which is good news).

This is required to do standard build tasks like yum or apt commands.

  • User namespaces

These must be switched on in the kernel, so that the build can believe it’s root while actually being an unprivileged user from the point of view of the kernel.

This is currently switched off by default in CentOS, but is easily switched on.

Try it

A reproducible VM is available here.

The kaniko work is available here.

See also:

Jessie Frazelle’s post on building securely on K8s

Special thanks to @lordcyphar for his great work generally in this area, and specifically helping me get this working.


If you like this post, you might like  Learn Git the Hard WayLearn Bash the Hard Way or Docker in Practice

learngitthehardway

hero

Get 39% off Docker in Practice with the code: 39miell2


Learn Git Rebase Interactively

If you’ve ever wondered what a rebase is, then you could try and read up on it and understand it from the man pages. Many people struggle with this, and I was among them. The man pages have been improved lately, but are still daunting to the casual user.

A more effective way to learn is to be thrown into it at the deep end by being forced to do it at work. Unfortunately this can be a terrible way to learn as you are panicked about breaking something in some way you don’t understand.

So I came up with an interactive tutorial that you can run to learn what a rebase is.

Run The Tutorial

Running the tutorial has only two dependencies: pip and docker.

If you have those, then run:

pip install shutit # You might need sudo for this, depending on your pip setup
git clone https://github.com/ianmiell/git-rebase-tutorial
cd git-rebase-tutorial
./run.sh

Video

Here’s a demo of it in action:

Features

At any point, you can hit these button combinations (hit both at the same time, CTRL key first):


CTRL-]

This submits your current state for checking.


CTRL-h

Gives you a hint.


CTRL-g

Restores the state of the tutorial to a pristine state at this stage.


CTRL-s

Skips the current stage of the tutorial.


CTRL-q

Quits the ShutIt session.


Problems?

Raise an issue here.

If you can reproduce with ./run.sh -l debug and post the output, then that’s ideal.



If you like this post, you might like  Learn Git the Hard WayLearn Bash the Hard Way or Docker in Practice

learngitthehardway

hero

Get 39% off Docker in Practice with the code: 39miell2


 

Terminal Perf Graphs in one Command

Sysstat and Graphs

If you have the sysstat package set up on your server, then you likely already know you can get historical CPU performance information with sar like this:

$ sar | head
Linux 4.15.0-10-generic (cage) 27/03/18 _x86_64_ (4 CPU)
00:00:01 CPU %user %nice %system %iowait %steal %idle
00:01:01 all 1.79  0.00  21.70   0.03    0.00   76.47
00:02:01 all 1.28  0.01  10.09   0.01    0.00   88.62
00:03:01 all 1.39  0.00  6.09    0.01    0.00   92.51
00:04:01 all 1.20  0.00  5.23    0.02    0.00   93.55
00:05:01 all 1.26  0.00  5.74    0.01    0.00   92.98
00:06:01 all 1.30  0.00  20.46   0.83    0.00   77.40
00:07:01 all 0.82  0.01  9.28    0.02    0.00   89.87

and you also probably know you can get the history of various metrics, not just CPU, eg disk IO, run queue size, and so on.

This info is great, but sometimes you just want a quick view of what’s going on.

You can faff around with some platform to try and get the information graphed in a sophisticated way if you have the time, skills and inclination. But mostly I just want a quick view with the least fuss.

So I used this script and bundled it into a container image to produce ascii graphs with one command. They look like this (click here to enlarge):

sar_report output. Click to enlarge

At first they’re hard to parse, but you quickly get used to them.

They’re great for quickly seeing when things went south, and what else went on at the time.

Running

To run this on your host, do:

$ docker run \
  -e SAR_REPORT_DAY="${SAR_REPORT_DAY:-$(date +%d)}" \
  -e LINES=${LINES:-$(stty size | awk '{print $1}')} \
  -e COLUMNS=${COLUMNS:-$(stty size | awk '{print $NF}')} \
  -e TERM=${TERM:-xterm} \
  -v /var/log/sysstat:/var/log/sysstat:ro \
  imiell/sar_report

The -e arguments set the ‘day’ to report on (defaults to today’s day), and pass the terminal settings to

The -v flag mounts the /var/log/systat folder into the container (it is mounted read-only, to reduce any risk/fear of the Docker container messing up your host’s filesystem).

To change the day, set (eg) SAR_REPORT_DAY=01 in the terminal before running the command.

You’ll need Docker (and sysstat, of course) installed and running on your host for this to work out of the box.

Or you can run the command:

$ ./sar_report

from the repo‘s folder.

Suggestions? Problems?

The code is here and is a work in progress – please suggest changes/raise bugs etc. on Github.

 


If you like this post, you might like  Learn Git the Hard WayLearn Bash the Hard Way or Docker in Practice

learngitthehardway

hero

 

Get 39% off Docker in Practice with the code: 39miell2


 

 

 

 

 

 

 

 

 

 

 

git log – the Good Parts

If you’re managing a complex git codebase with multiple developers, then you may well be using a tool like GitHub or BitBucket to delve into the history and figure out branch and merge issues.

These GUIs are great for providing a nice user interface for managing pull requests and simple histories and the like, but when the workflow SHTF there’s no substitute for using git log and its relatively little-known flags to really dig into the situation.

You’re going to run through this with me so that I know you’ve got it. Type the commands in bold to follow.

This is based on material from my book Learn Git the Hard Way, a free sample available here.

An Example Git Repository

Run this to download a fairly typical git repository that I work on:

$ git clone https://github.com/ianmiell/cookbook-openshift3-frozen
$ cd cookbook-openshift3-frozen

NB this is a copy of the original repo, ‘frozen’ here to provide stable output. 

 

git log

git log is the vanilla log command you are probably already familiar with:

$ git log

commit f40f8813d7fb1ab9f47aa19a27099c9e1836ed4f 
Author: Ian Miell <ian.miell@gmail.com>
Date: Sat Mar 24 12:00:23 2018 +0000

pip

commit 14df2f39d40c43f9b9915226bc8455c8b27e841b
Author: Ian Miell <ian.miell@gmail.com>
Date: Sat Mar 24 11:55:18 2018 +0000

ignore

commit 5d42c78c30e9caff953b42362de29748c1a2a350
Author: Ian Miell <ian.miell@gmail.com>
Date: Sat Mar 24 09:43:45 2018 +0000

latest

It outputs 5+ lines per commit, with date, author commit message and id. It goes in reverse time order, which makes sense for most cases, as you are mostly interested in what happened recently.

NOTE: output can vary depending on version, aliases,
and whether you are outputting to a terminal!
My version here was 2.7.4.

--oneline

Most of the time I don’t care about the author or the date, so in order that I can see more per screen, I use --oneline to only show the commit id and comment per-commit.

$ git log --oneline
ecab26a JENKINSFILE: Upgrade from 1.3 only
886111a JENKINSFILE: default is master if not a multi-branch Jenkins build
9816651 Merge branch 'master' of github.com:IshentRas/cookbook-openshift3
bf36cf5 Merge branch 'master' of github.com:IshentRas/cookbook-openshift3

--decorate

You might want more information than that, though, like which branch was that commit on? Where are the tags?

The --decorate flag provides this.

$ git log --oneline --decorate
ecab26a (HEAD -> master, origin/master, origin/HEAD) JENKINSFILE: Upgrade from 1.3 only
886111a JENKINSFILE: default is master if not a multi-branch Jenkins build
9816651 Merge branch 'master' of github.com:IshentRas/cookbook-openshift3

More recent versions of git put this in the terminal by default, so things are improving for my fingers.

(Remember that your version might do --decorate by default fir git log when output goes to the terminal instead of a file).

--all

$ git log --oneline --decorate --all
ecab26a (HEAD -> master, origin/master, origin/HEAD) JENKINSFILE: Upgrade from 1.3 only
886111a JENKINSFILE: default is master if not a multi-branch Jenkins build
9816651 Merge branch 'master' of github.com:IshentRas/cookbook-openshift3
[...]
a1eceaf DOCS: Known issue added to upgrade docs
774a816 (origin/first_etcd, first_etcd) first_etcd
7bbe328 first_etcd check
654f8e1 (origin/iptables_fix, iptables_fix) retry added to iptables to prevent race conditions with iptables updates
e1ee997 Merge branch 'development'

Can you see what it does? If you can’t, compare it to --oneline above and dig around to figure it out.

That’s great, but what would be great is a visual representation of all those branches…

--graph

--graph gives you that visual representation, but in the terminal. While it might not look as slick as some git GUIs, it does have the benefit of being consistently viewed anywhere, and much more configurable to your specific needs.

And when you’re trying to piece together what happened on a 15-team project that doesn’t rebase, it can be essential…

$ git log --oneline --decorate --all --graph
* ecab26a (HEAD -> master, origin/master, origin/HEAD) JENKINSFILE: Upgrade from 1.3 only
* 886111a JENKINSFILE: default is master if not a multi-branch Jenkins build
* 9816651 Merge branch 'master' of github.com:IshentRas/cookbook-openshift3
|\ 
| * bf36cf5 Merge branch 'master' of github.com:IshentRas/cookbook-openshift3
| |\ 
| | * 313c03a JENKINSFILE: quick mode is INFO level only
| | * 340a8f2 JENKINSFILES: divided up into separate jobs
| | * 79e82bc JENKINSFILE: upgrades-specific Jenkinsfile added
| * | dce4c71 Add logic for additional FW for master (When not a node)
* | | d21351c Update utils/atomic
|/ / 
* | 3bd51ba Fix issue with ETCD
* | b87091a Add missing FW for HTTPD
|/ 
* a29df49 Missing (s)
* 51dff3a Fix rubocop

DON’T PANIC!

The above can be hard for the newcomer to parse, and there is little out there to guide you, but a few tips here can make it much easier to read.

The * indicates that there is a commit on the line, and the details of the commit (here the commit id, and first line of the comment) are on the right hand side.

The lines and position of the * indicate the lineage (or parentage) of each change. So, to take these three lines for example:

| * bf36cf5 Merge branch 'master' of github.com:IshentRas/cookbook-openshift3
| |\ 
| | * 313c03a JENKINSFILE: quick mode is INFO level only

The green pipes indicate that while the two changes listed here were going on, another branch had a gap between its two changes (9816651 and d21351c).

The blue line takes you to one parent of the bf36cf5 merge (what’s the commit id of the blue parent?), and the pink one goes to the other parent commit (313c03a).

It’s worth taking a bit of time to figure out what’s going on here, as it will pay dividends in a crisis later…


If you like this post, you’ll like my book Learn Git the Hard Way

It covers all this and much more in a similar style.

learngitthehardway


--simplify-by-decoration

If you’re looking at the whole history of a project and want to get a feel for its shape before diving in, you may want to see only the significant points of change (ie the lines affected by -–decorate above).

These remove any commit that wasn’t tagged, branched (ie there’s no reference). The root commit is always there too.

$ git log --oneline --decorate --all --graph --simplify-by-decoration
* ecab26a (HEAD -> master, origin/master, origin/HEAD) JENKINSFILE: Upgrade from 1.3 only
| * 774a816 (origin/first_etcd) first_etcd
|/ 
| * 654f8e1 (origin/iptables_fix) retry added to iptables to prevent race conditions with iptables updates
|/ 
* 652b1ff (origin/new-logic-upgrade) Fix issue iwith kitchen and remove sensitive output
* ed226f7 First commit

Try tagging a specific commit not listed above, and then re-run the command.

File Info

Using --oneline can be a bit sparse, so --stat can give you useful information about what changed.

The number indicates the numbers of lines that were changed, with insertions represented by a + sign, and deletions by a -. There’s no concept of a ‘change’ to a line as such: the old line is deleted, and then the new one added even if only one character changed.

$ git log --oneline --decorate --all --graph --stat
* ecab26a (HEAD -> master, origin/master, origin/HEAD) JENKINSFILE: Upgrade from 1.3 only
| Jenkinsfile.upgrades | 2 +-
| 1 file changed, 1 insertion(+), 1 deletion(-)
* 886111a JENKINSFILE: default is master if not a multi-branch Jenkins build
| Jenkinsfile.full | 2 +-
| Jenkinsfile.upgrades | 2 +-
| 2 files changed, 2 insertions(+), 2 deletions(-)

If you find --stat hard to remember, then an alternative is to use --name-only, but with that you lose the information about numbers of changes to files.

Regex on Commits

This one’s also really handy. The -G flag allows you to search for all commits and only return commits and their files whose changes include that regexp.

This one, for example, looks for changes that contain the text chef-client

$ git log -G 'chef-client' --graph --oneline --stat
...
* 22c2b1b Fix script for deploying origin
| scripts/origin_deploy.sh | 65 ++++++++++++-----------------------------------------------------
| 1 file changed, 12 insertions(+), 53 deletions(-)
... 
| * | 1a112bf - Move origin_deploy.sh in scripts folder - Enable HTTPD at startup
| | | origin_deploy.sh | 148 ----------------------------------------------------------------------------------------------------------------------------------------------------
| | | scripts/origin_deploy.sh | 148 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
| | | 2 files changed, 148 insertions(+), 148 deletions(-)
... 
| * | 9bb795d - Add MIT LICENCE model - Add script to auto deploy origin instance
|/ / 
| | origin_deploy.sh | 93 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
| | 1 file changed, 93 insertions(+)

If you’ve ever spent ages searching through git log --patch output looking for a specific change this is a godsend…

The eccentrically-named --pickaxe-all gives you information about all files that changed in the commit, rather than just the ones that matched the regexp in the commit.

$ git log -G 'chef-client' --graph --oneline --stat --pickaxe-all

Try it out!

 


If you like this post, you’ll like my book Learn Git the Hard Way

It covers all this and much more in a similar style.

learngitthehardway


If you liked this post, check out:

Five Key Git Concepts Explained the Hard Way

Create Your Own Git Diagrams

Ten Things I Wish I’d Known About bash

A Non-Cloud Serverless Application Pattern Using Git and Docker

Five Key Git Concepts Explained the Hard Way

If you’ve ever read a git man page, you’ll know that trying to understand git can be an intimidating experience.

There’s even a git man page generator that produces joke git pages:

If <upstream> is not specified, the upstream configured in
branch.<name>.remote and branch.<name>.merge options will be used 
(see git-config(1) for details) and the --fork-point option is 
assumed. If you are currently not on any branch or if the current
branch does not have a configured upstream, the rebase will abort.
git-land-remote lands some applied remotes over the packed applied branches, and it is in various cases a possibility that a filter-branched error must prevent staged cleaning of some named stages.

One of the above extracts is a joke, one is real…

So here’s five core git concepts explained.

Hopefully after reading this the man pages will start to make more sense. If you’re confused by one I’ve missed, contact me to write it up for you (@ianmiell or LinkedIn).

This post uses the ‘hard way‘ method to teach the concepts by having you type out the commands and think through what’s going on, without having to worry about breaking anything.

I use the same method to teach git in my book Learn Git the Hard Way.    

   learngitthehardway

 

1) Reference

Many will know this already, but I need to make sure you know it because it’s so fundamental.

A ‘reference’ is a string that points to a commit.

There are four main types of reference: HEAD, Tag, Branch, and Remote Reference

HEAD

HEAD is a special reference that always points to where the git repository is.

If you checked out a branch, it’s pointed to the last commit in that branch. If you checked out a specific commit, it’s pointed to that commit. If you check out at a tag, it’s pointed to the commit of that tag.

Every time you commit, the HEAD reference/pointer is moved from the old to the new commit. This happens automatically, but it’s all going on under the hood.

Tag

A tag is a reference that points to a specific commit. Whatever else happens (and unlike the HEAD), that tag will stay pointed at the commit it was originally pointed at.

Branch

A branch is like a tag, but will move when the HEAD moves.

You can only be on one branch at a time.

Type out these commands and explain what’s going on. Take your time:

$ mkdir lgthw_origin
$ cd lgthw_origin
$ git init
$ echo 1 > afile
$ git add afile
$ git commit -m firstcommit
$ git log --oneline --decorate --all --graph
$ git branch otherbranch
$ git tag firstcommittag
$ git log --oneline --decorate --all --graph
$ echo 2 >> afile
$ git commit -am secondcommit
$ git checkout otherbranch
$ git log --oneline --decorate --all --graph
$ echo 3 >> afile
$ git commit -am thirdcommit
$ git log --oneline --decorate --all --graph

Now do it again and explain to someone else what’s going on.

Remote Reference

A remote reference is a reference to code that’s from another repository. See below for more on that…

2)  ‘Detached Head’

Now that you know what HEAD is, then understanding what a ‘detached head’ is will be much easier.

A ‘detached head’ is a git repository that’s checked out but has no branch associated with it.

Continuing from the above listing, type this in:

$ git checkout firstcommittag

You get that scary message:

Note: checking out 'firstcommit'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

git checkout -b <new-branch-name>

HEAD is now at 1b1499c... firstcommit

but if you follow the instructions:

$ git log --oneline --decorate --all --graph
$ git checkout -b firstcommitbranch
$ git log --oneline --decorate --all --graph

you can figure out what’s going on. There was a tag, but no branch at that commit, so the HEAD was detached from a branch.

3) Remote Reference

A remote reference is a reference to a commit on another git repository.

$ cd ..
$ git clone lgthw_origin lgthw_cloned
$ cd lgthw_cloned
$ git remote -v
$ git log --oneline --decorate --all --graph

The log graph looks different doesn’t it?

Compare that to the ​git log output in the other folder and think about how they differ. What word do you see multiple times in the output that you didn’t see before?

The cloned repo has its own copy of the branch (firstcommitbranch) and tag (firstcommit) because that’s where the repository’s HEAD was when you cloned it.

$ git branch -a

shows all the branches visible in this repository, both local and remote.

Compare that to the output of the same command in the original folder. How does it differ?

Now check out your local master:

$ git checkout master

and you get a message saying:

Branch master set up to track remote branch master from origin.
Switched to a new branch 'master'

So you’ve got a local reference master which ‘tracks’ the master in the remote repository. The local reference is master, and the remote reference is origin/master. Git assumed you meant your local master to track the remote master.

The two branches look the same, but they are linked only by the configuration of this repository.

$ cd ../lgthw_origin
$ git checkout master
$ echo origin_change >> afile
$ git commit -am 'Change on the origin'

Then go back to the cloned repository and fetch the changes from the origin:

$ cd ../lgthw_cloned
$ git fetch origin
git log --oneline --decorate --all --graph

Can you see what happened to your local master branch, and what happened to the origin’s? Why are they now separate?

Note that you didn’t git pull the change. git pull does a fetch and a merge, and we don’t want to confuse here by skipping steps and making it look like magic.

In fact, git pull is best avoided when you are learning git…


If you like this post, you’ll like my book Learn Git the Hard Way

It covers all this and much more in a similar style.

learngitthehardway

4) Fast Forward

Your git log graph should have looked like this:

* 90694b9 (origin/master) Change on the origin
* d20fc9a (HEAD -> master) secondcommit
| * 2e7ae21 (origin/otherbranch) thirdcommit
|/ 
* 6c14f2f (tag: firstcommittag, origin/firstcommitbranch, origin/HEAD, firstcommitbranch) firstcommit

(Your ids may differ from the above – otherwise it should be the same.)

Now, do you see how the Change on the origin commit is not branched from your local HEAD/master commit secondcommit – it’s in a ‘straight line’ from the firstcommit tag?

That means that if you ‘merge’ origin/master into your local master, git can figure out that all it needs to do is move the HEAD and master reference to where the origin/master branch is and its ‘merge’ job is done.

$ git merge origin/master
$ git log --oneline --decorate --all --graph

This is all a ‘fast forward’ is: git saw that there’s no need to do any merging, it can just ‘fast forward’ the references to the point you are merging to. Or if you prefer, it just moves the pointers along rather than create a new merge commit.

We just did a git pull, by the way. A git pull consists of a git fetch and a git merge. Breaking it down into these two steps helps reduce the mystery of why things can go wrong.

As an exercise, after finishing this article do the whole exercise again, but make a change to both origin/master and master and then do the fetch and merge to see what happens when a fast-forward is not possible.

5) Rebase

master and origin/master are now in sync, so now run these commands to see what a rebase is:

$ cd ../lgthw_origin 
$ git status
$ echo origin_change_rebase >> afile 
$ git commit -am 'origin change rebase' 
$ git log --oneline --decorate --all --graph 

OK so far? You’ve made a change on master on the origin repo:

$ cd ../lgthw_cloned 
$ echo cloned_change_rebase >> anewfile 
$ git add anewfile 
$ git commit -m 'cloned change rebase in anewfile' 
$ git log --oneline --decorate --all --graph 
$ git fetch origin 
$ git log --oneline --decorate --all --graph 
$ git rebase origin/master 
$ git log --oneline --decorate --all --graph

Can you see what’s happened?

If not, have a close look at the last two git log outputs.

That’s what a rebase is – it takes a set of commits and moves (or ‘re-bases’) them to another commit.

 


If you liked this post, you’ll like my book Learn Git the Hard Way

It covers all this and much more in a similar style.

learngitthehardway


If you liked this post, you might also like these:

Create your own Git diagrams

A Git Serverless Pattern

Power Git Log Graphing

Interactive Git Rebase and Bisect Tutorials