Git in a nutshell

For a client, I created a workshop introduction to git. Due to the fact the team was not familiar with git, I started very basic, just to get some grasp of the common scenarios you will encounter when using git.
Below is an excerpt from the document I handed over to the team. I thought it might be valuable to share it with the world. 🙂

Happy coding reading!

// Ryan

Introduction

Git is a distributed version control system. This means that every team member has got a local store that contains all the source code of a project, including the complete history of all changes of all files. This store is called a ‘repository’ and it is located in a hidden folder ‘.git’ on the local file system.

To share your work with your team members, you need to connect with the repository located on a server on the network, this is called the remote (repository). The remote repository also has the complete history of all files. To distribute your changes, you will typically use this work flow:

  • commit
    add your changes to your local repository, this can be one or more commits,

  • push
    your local commits are pushed to the remote repository,

  • fetch
    other users can download your changes from the remote repository,

  • merge
    after a fetch, you can integrate the changes from others with your local sources,

  • pull
    usually the steps fetch and merge are performed together, we call this a pull.

Note that those are separate actions, you can make multiple commits locally without pushing them to the remote repository, or just fetch from the remote to see what others are doing. This work flow is very different from TFS where all changes must be checked in to the central server. In other words, you always need to be connected to a TFS server to have your changed committed / checked in.

This document will further describe the basic situations one will encounter on a daily basis, but on a conceptual level, that is, no actual git commands will be discussed.

What is a commit?

The main building block of git is the commit. A commit is a list of all changes (since its parent commit) and a few properties that identify the commit uniquely within the repository. Every commit is identified by a unique id. This id is a hash over, among others, the commit date, its previous commit id, the author’s name. So, committing the exact same changes will always produce another commit id.

What is a branch?

Another main building block is branches. A branch is just a stream of consecutive commits. Branches are used to keep your work (and others work) apart from the code that is ‘accepted’ to be part of the application or project you are working on. A branch is nothing more than a pointer to a certain commit, the branch thus ‘contains’ everything that the commit is pointing to, all the way down to the first commit. A new branch can be ‘branched’ from any commit thus creating a separate branch from that point forward.

There is typically one root branch called the main (master) branch. Other branches are created from the master branch and will contain work in progress. Once the new functionality is completed, tested and reviewed it can be merged back into the main branch. This way the main branch will always contain working, tested, reviewed and production deployable source code.

In the example above a branch is created at the same commit where the master branch was currently at. Next, we commit some work to the feature branch.

The feature1337 branch ‘contains’ all commits as its last commit points to all the way down. The master branch, on the other hand, is still unchanged as it is still pointing at the same commit.

One can create as many branches as one likes to create at any given commit. Branches are very lightweight, as they are just pointers to commits. The commits contain the real changes, the deltas.

In the example above, you also see ‘HEAD’. This is where git keeps track to add the next commit, usually it points to the branch you are working on. Your local directory will always reflect the changes of the commits up where the ‘HEAD’ is located at. If the developer would switch (checkout) to the master branch, the local directory will not contain the changes of branch feature1337 anymore. Checking out branch feature1337 will change the local directory again. Having multiple branches, one can easily move back and forth between what one is working on, without needing to be connected to a central server.

What is merging?

There will be a time that you have finished your work and want it to be ‘accepted’ into the master branch. Git uses merging for this. The master branch gets merged with the feature branch. However usually teams do not merge immediately into the master branch. Usually there is another branch that is used for testing your work, integrated with work from others before it gets merged into the master branch. Let’s call that branch develop.

The develop branch should start at where the master branch is currently located.

We checkout the develop branch (thus changing the HEAD where the next commit will arrive).

Then we merge the develop branch with the feature1337 branch. Note that the master branch and the feature1337 branches have not changed. The develop branch ‘contains’ the master branch and the feature1337 branched as the merge commit points to both commit points.

The develop branch can now be deployed and tested. If a bug is found, fix it in the feature branch and merge develop again with the feature branch. Rinse and repeat until you are happy with your changes.


When all is well, the master branch can be merged with the feature branch that is hardened and tested.

Now the feature branch is merged into master, we can throw it away, as it has served its purpose. There is no need to keep it around anymore. The commits will not be deleted as the master branch is pointing to them now.

As you can see, the develop branch is never merged into the master branch. The develop branch might contain other branches that are not ready to be merged into master!

The develop branch is typically not worked on; it should contain only merge commits with other branches.

And after a sprint, the develop branch is typically ‘reset’ at where the master branch currently is. Its pointer is just pointing to the same commit as the master branch is pointing to. The merge commits that were in the develop branch will be deleted eventually, as there is no branch pointing to them anymore.

What about remote?

Up until now we have not included a central repository, usually called remote, but at some point in time you will want to ‘push’ your changes to the central repo, so your co-workers can take advantage of your superb work.

When you get a copy of the repository from remote to your local computer, there will be a special branch of the master branch (origin/master branch) in your local repository that reflects what is currently known from the remote server. The need for the special branch is because your local master branch can divert from the remote master branch, as we saw earlier, you can merge you feature branches into the master branch locally, but that does not change the remote master branch automatically. To get our changes on the remote server, we need to push them.

Let’s see how that works, for brevity’s sake we will leave out the develop branch, as the same principle applies to all branches.

Locally we have created a feature branch and added one commit.

When we push our feature branch to remote, we get a special branch origin/feature for it too. Note that the remote repository has a feature branch with the same exact commit. It is an exact copy.

Let’s fast forward in time; we have added another commit and merged master with the feature branch, as we are ready with it. Note that the special branches are still where they were, they reflect the currently known state of the remote repository.

Now we have pushed the master branch to remote. Note that our origin/master branch has moved. It reflects the currently known state from the remote.
Origin/feature branch did not change, as we had not explicitly pushed the feature branch. The remote repository does not know that the local feature branch has an extra commit.
However it has the extra commit, because the master branch was pointing to it, it just doesn’t know that the local feature branch has it too.

Side note: the graph on the remote is the same as locally, the tool we are using here, annoyingly, has a bug for displaying it properly. When you look closely, you’ll see the dotted line from commit fccd7e7 to commit dda162f. (http://git-school.github.io/visualizing-git/)

Meanwhile, Bob has seen our feature branch and has added a commit to it and pushed it to the remote. Naughty Bob!

Unaware of Bobs changes we want to be a good citizen and push the current version of our local feature branch to remote.

But git will not allow it! Eh, what? … git sees that the remote feature branch has evolved, and we need to get the changed origin/feature branch first.

To get the commits from the remote, we need to fetch the origin/feature branch. By then we see the commit of Bob, that’s why we could not push our branch! The local and the remote version of the feature branch have diverted.

To restore the order in the world, we need to merge our local feature branch with the origin/feature branch. Note how our local feature branch now points to both the commit from Bob and our local commit.

Now we are allowed to push our local feature branch to remote.

Pardon the remote display …

To finish the story, we merge the master branch with the feature branch locally.
Note again how the origin branches stay where they are.

Finally, we push the local master branch to remote. There is no problem to do so, as our local origin/master was the same as the one on remote.

What is next?

This document has barely touched the very basics of the git possibilities. To learn more about git you can read the following sites (and you’ll probably still need to read others)