Learning the basics of the distributed version control system Git
Using a distributed version control system is so natural nowadays that if you are reading this book, you are probably already using one. However, if you aren't, read this recipe carefully. You should always use a version control system for your code.
Getting ready
Notable distributed version control systems include Git, Mercurial, and Bazaar. In this chapter, we chose the popular Git system. You can download the Git program and Git GUI clients from msysGit ( TortoiseGit (https://code.google.com/p/tortoisegit/).
Note
Distributed systems tend to be more popular than centralized systems such as SVN or CVS. Distributed systems allow local (offline) changes and offer more flexible collaboration systems.
Online providers supporting Git include GitHub (Bitbucket (Google code (Gitorious ( ( offers special features and discounts to academics (https://github.com/edu). Synchronizing your Git repositories on such a website is particularly convenient when you work on multiple computers.
You need to install Git (and possibly a GUI) for this recipe (see (Mac OS X (https://mac.github.com). Most Python libraries we will be using in this book are being developed on GitHub.
How to do it…
We will show two methods to initialize a repository.
Creating a local repository
This method is best when starting to work locally. This can be with using the following steps:
- The very first thing to do when starting a new project or computing experiment is create a new folder locally:
$ mkdir myproject $ cd myproject
- We initialize a Git repository:
$ git init
- Let's set our name and e-mail address:
$ git config --global user.name "My Name" $ git config --global user.email "me@home"
- We create a new file, and tell Git to track it:
$ touch __init__.py $ git add __init__.py
- Finally, let's create our first commit:
$ git commit -m "Initial commit."
Cloning a remote repository
This method is best when the repository is to be synchronized with an online provider such as GitHub. Let's perform the following steps:
- We create a new repository on the web interface of our online provider.
- On the main webpage of the newly created project, we click on the Clone button with the repository URL and we type in a terminal:
$ git clone /path/to/myproject.git
- We set our name and e-mail address:
$ git config --global user.name "My Name" $ git config --global user.email "me@home"
- Let's create a new file and tell Git to track it:
$ touch __init__.py $ git add __init__.py
- We create our first commit:
$ git commit -m "Initial commit."
- We push our local changes to the remote server:
$ git push origin
When we have a local repository (created with the first method), we can synchronize it with a remote server using a git remote add
command.
How it works…
When you start a new project or a new computing experiment, create a new folder on your computer. You will eventually add code, text files, datasets, and other resources in this folder. The distributed version control system keeps track of the changes you make to your files as your project evolves. It is more than a simple backup, as every change you make on any file can be saved along with the corresponding timestamp. You can even revert to a previous state at any time; never be afraid of breaking your code anymore!
Specifically, you can take a snapshot of your project at any time by doing a commit. The snapshot includes all staged (or tracked) files. You are in total control of which files and changes will be tracked. With Git, you specify a file as staged for your next commit with git add
, before committing your changes with git commit
. The git commit -a
command allows you to commit all changes in the files that are already being tracked.
When committing, you need to provide a message describing the changes you made. This makes the repository's history considerably more informative.
Note
How often should you commit?
The answer is very often. Git only takes responsibility of your work when you commit changes. What happens between two commits may be lost, so you'd better commit very regularly. Besides, commits are quick and cheap as they are local; that is, they do not involve any remote communication with an external server.
Git is a distributed version control system; your local repository does not need to synchronize with an external server. However, you should synchronize if you need to work on several computers, or if you prefer to have a remote backup. Synchronization with a remote repository can be done with git push
(send your local commits on the remote server), git fetch
(download remote branches and objects), or git pull
(synchronize the remote changes on your local repository).
There's more…
The simplistic workflow shown in this recipe is linear. In practice though, workflows with Git are typically nonlinear; this is the concept of branching. We will describe this idea in the next recipe, A typical workflow with Git branching.
Here are some excellent references on Git:
- Hands-on tutorial, available at https://try.github.io
- Git Guided Tour, at http://gitimmersion.com
- Atlassian Git tutorial, available at www.atlassian.com/git
- Online course, available at www.codeschool.com/courses/try-git
- Git tutorial by Lars Vogel, available at www.vogella.com/tutorials/Git/article.html
- GitHub Git tutorial, available at http://git-lectures.github.io
- Git tutorial for scientists, available at http://nyuccl.org/pages/GitTutorial/
- GitHub help, available at https://help.github.com
- Pro Git by Scott Chacon, available at http://git-scm.com
See also
- The A typical workflow with Git branching recipe