GIT

Introduction

If you do software development which goes any further than a simple hello world program, you need a version control system. Without it, many things will hopelessly go wrong, especially if you work with multiple people on a code base.

Git is such a version control system. It has become the de facto standard, although there are ohter popular alternatives like Subversion and Mercurial.

Despite being the most used system, Git needs some time to get used to. Git has two noticeable characteristics. First, its conceptual models are very robust and rigorous, so it can handle complex and large projects with confidence. At the same time its can scale down elegantly to simple projects. But git als has a noticible down side, and that is its user interface. The only officially supported user interface is the command line, which is not very well designed. As a user, you need to know too much about the complex internal state of Git. And although there are many Git GUIs, most of them are not even worth trying, and even the good GUIs are confusing in complicated scenario's.

Getting started

That being said, working with Git is usually quite enjoyable. Most of the time you only need a handful of commands:

The first thing you have to do when working on an existing project is to check it out:

git clone git@gitlab.com:some/subdir/project.git

If you already have the project checked out, but some time has passed, you typically want to update your project to the latest remote version with:

cd project
git pull

Now you can start working in the project, and once you are happy with certain changes, you typically do something like this on the command line in the project directory:

git status
git add .
git commit -m "my commit message"
git push

A bit more background

Now let's elaborate a bit on the previous section.

The example of the git clone command in the previous secion assumes that ssh is working on your computer, and your public key is known at the git server, in this example gitlab.com. A later section will discuss this in detail. Until then, you can also clone via https:

git clone https://gitlab.com/some/subdir/project.git

The git clone command does exactly what its name suggests: it copies the repository from the remote server onto your computer. You receive the complete history of the default branch, so even without a network connection you can keep working normally: creating commits, inspecting history, switching to existing local branches, etc. When the connection is restored, you can simply push your work to the remote server. This independence from the network is a core part of Git's robustness.

So, where does Git store all of this? The answer is in the .git folder. In principle, you never have to bother about what's inside the .git folder, that's why Git made it hidden. But when you are experimenting with a small toy project to learn Git, looking inside the .git folder gives you valuable insight into the internal workings of Git. And, to be honest, you need this insight. Git can not be mastered by memorizing its commands, you (unfortunatly) need to know quite a bit of its internal workings to make good disscion when things get more complex.

Since the whole history of you project is in the .git folder, you can delete all files and folders (except the the .git folder), and restore them (up to the your latest commit) with the command

git restore .

Creating a new Git project

If you have folder which you want to make a git project, you have to enter this folder with the command line and enter:

git init

This will create the hidden .git folder. You can now add and commit files as explained in the getting started section. But you can't push your changes yet, because you first have to tell git where it should push to. For example with:

git remote add origin git@gitlab.com:some/subdir/project.git

Because we created a new project rather than clone an existing one, you have to choose a name which does not yet exist on the server.

Note that the name origin is just a name that is commonly used for the remote, but you could choose a different name, for example if you have multiple remotes.

You can see which remotes are configured for you project with:

git remote -v

From now on, you can now do git push, but the first time git usually want to have explicit confirmation that your local branch should be connected with the upstream branch. This is a typical example of git being a bit difficult for newcommers, because in our current situation this is the only logical thing to do. But in the general case it makes sense that git asks, and there is an option you can make git do this automatically. So, if your local branch is called develop, this first time you should push with:

git push --set-upstream origin develop

Working with submodules

If your project has submodules, there is a file called .gitmodules in the root of your project. This file specfies for each submodule the remote location where git can pull it from and where the submodule should be placed in your current project.

Initializing submodules

After you cloned a project which has sub modules, the sub modules are not checked out automatically. Git will only make an empty place-holder folder in your source tree, on the location which is specified in the .gitmodules file.

You can populate this folder with the actual files by entering this command:

git submodule update --init --recursive

An important detail, which can lead to confusion for new users, is that the check-out of the sub-module after a fresh clone is usually not to the latest version of the sub-module, but to a specific version which is commited in the main project. This way, a git checkout becomes reproducible, and independent of future changes of the sub-module. If you want a the latest version of the sub-module, you have to do this explicitly as discribed in the next section.

Updating a sub-module

Git pins the version of each sub-module to a specific commit hash, which is commited as part of the superproject. And it will stay at this pinned version, unless you upate it explictly, with the following command:

git submodule update --remote path/to/submodule

Next, you have to stage this change with

git add path/to/submodule

after which you can commit and push it.

Adding a new submodule

The following command adds the project mathfunctions to your project, and and puts the files in folder extern/mathfunctions:

git submodule add https://gitlab.com/somedir/mathfunctions.git extern/mathfunctions

If you do now git status, you see that two things are staged to be commited:

new file: .gitmodules
new file: extern/mathfunctions

.gitmodules is a human readible file which contains the location of the submodule. extern/mathfunctions is not actually a file, but rather an hidden git object that specifies the exact commit of the submodule project to which your project is pinned.

Developing directly in the sub-module

The standard behaviour of git is to treat sub-modules as read-only dependencies. Although nothing prevents you from making changes to the files in the sub-folder directory, you normally should not do so. The correct way is to go to the original git project of the sub-module and make your changes there, and commit and push them. Next, you update the submodule in your dependent project. This way of working prevents the sub-module project from being haphazardly changed from many different contexts.

But there is one case in which changing the sub-module directly is appropriate, and that is when the main project serves as a test rig for the sub-module. In this case, you change the sub-module together with its test rig, so doing the commits in different projects would be very inconvenient.

But if you try to do this after a normal clone, things still go wrong. That is because the sub-module is not tracking any branches. It is checked out in the so called detached HEAD state. Committing is still sort of possible, but then the commit becomes hard to find afterwards because nothing is pointing to it. Pushing will fail, because git does not know where to push it to.

You can fix this by explicitly switching the sub-module to an existing branch of the sub-module project, e.g. the develop branch:

cd path/to/submodule
git switch develop

Now you can commit and push from inside the submodule directory.

By default, git push in the superproject does not push submodules. If you want a single git push to also push submodule commits when needed, set:

git config --global push.recurseSubmodules on-demand