The Git data model

No Comments

It is very helpful to learn the Git data model if you use Git. The data model is actually simple but in my opinion not very well-documented. I find myself describing it every now and then so I wrote this post.

Blobs and trees

Internally in a Git repository, files are called blobs and directories are called trees.

Trees contain blobs and/or other trees (ie. subdirectories). Trees can also contain links, which are references to a commit in another Git repository. That other repository is called a submodule within ours.

Exercise: Run git ls-tree HEAD
This lists the contents of the tree corresponding to the current working directory in a repo. This is a low-level command, uncommon in daily use, but helpful to study the data model. HEAD is more common, it is explained later.


Each commit at a minimum consists of

  • a reference to a tree with the top-level directory contents of the repo
  • zero or more references to parent commits which precede this commit
  • author name, email and timestamp
  • committer name, email and timestamp, equal to or different from author
  • a commit message

Optionally, a PGP signature and a few other things can be included in commits. The backwards-pointing chain of parent references ("reverse linked list" or "directed acylical graph") is what creates the repository history.

Exercise: Run git show --pretty=raw
This shows the unformatted contents of the last commit. (Note: Not the contents of any trees or blobs.)

Exercise: Run git log --pretty=raw
This shows the unformatted history, up to and including the last commit, last commit first. (Try this on a simple repository, otherwise the output can be messy.)

Object identifiers

Blobs, trees and commits are objects, which are identified by a hash over their contents. If the contents of an object changes then the hash changes too, which means that it is actually a different object.

Blob hashes change when the file contents changes. Tree hashes change when the directory contents changes (note that blobs are referenced both by name and by hash within trees, so if a file is modified but not renamed then the blob hash still changes, and a tree that referenced the old blob must also change in order to refer to the new blob). Commit hashes change when either the top-level tree or any metadata changes.

Hashes are useful identifiers because they reference the actual contents of objects.

git ls-tree HEAD lists the objects in a tree with their permissions, type, identifier (hash) and name. Adding the option -l also lists their size.


A branch is a name that references any one commit in the repository.

When a new commit is created (e.g. with a git commit command) the currently checked-out commit is referenced as parent within the new commit and the currently checked-out branch, if any, is automatically changed to reference the new commit.

Exercise: Run git branch -v
This lists local branches in verbose mode (-v), which includes an abbreviated commit hash and the commit message summary. The leading * indicates the currently checked-out branch. master is the default branch name in a Git repository.

If you cloned this repository from another repository, also try:

Exercise: Run git branch -a -v
This lists all branches (-a) in verbose mode (-v). You should notice that this lists extra branches, so-called "remote branches", which are prefixed with the remote name, origin by default. These remote branches simply mirror all branches that exist in the remote repository. They exist for reference only so don't commit on them.

Exercise: Run git branch mynewbranch origin/master
This creates a new local branch called mynewbranch which references the same commit that the remote branch origin/master referenced at the time of running the command.
Verify by running git branch -v
This shows mynewbranch along with the previously existing master.

Read the git branch --help documentation to find out how to delete, rename and otherwise modify branches.


A tag is also a name that references any one commit in the repository, but unlike a branch it is not meant to ever change and it is never changed automatically.

Exercise: Run git tag nicetag
This creates a new tag called nicetag without an associated message and without a GPG signature, referencing the currently checked-out commit.

Exercise: Run git tag -l
This lists all tags in the repository.

Exercise: Run git show --decorate nicetag
This shows the commit referenced by nicetag and decorates the commit id with all branch and tag names that reference this commit.


HEAD is a special commit reference (ref) that refers to the currently checked-out commit, regardless of how it came to be the currently checked-out commit.

Because neither tags nor individual commits should be changed (indeed commits can not be changed, only replaced) Git reminds you that your working directory is in a "detached HEAD" state if you checkout a commit or a tag, as opposed to a branch, even though branch and tag may reference that same commit. Git remembers what you've checked out. There's nothing wrong with a detached HEAD if you only want to look around, but if you want to make some changes then you should create a branch. You can create a branch at any time, also with a detached HEAD. Then HEAD isn't detached anymore.

That's all there is to it! Have fun with Git.