Getting starting with git

Why git?

There are a lot of reasons to use git, many of the advantages are things you can get from any distributed, offline version control system: you can work offline and have everything available to you that you need without needing to be connected to the internet. That includes project history, commits, etc.

Git is clean, only one .git directory in the root, rather than one .svn in every directory of the repository, this makes it much easier to deal with when you pull out your recursive grep.

Moving, renaming, deleting directories are really easy and clean to use.

Once you understand the concepts, branching is super easy. You might say, “I never do branching, so I dont really care about this!” Thats probably because in other systems branching is prohibitively difficult, so you don’t know what awesome benefits you have available to you if its super easy. You can easily try things out by creating new branches, switch between branches with ease, test things in separate branches, have multiple remote repositories available to you in different branches and best of all, merge between the different branches. It all sounds fun, but until you see how easy it is, this is just a bunch of words.

Tracking multiple remote repositories is easy, you can watch remote changes and cherry-pick them into your own branch with ease.

Why not git?

Git’s interface and UI are terrible, there are a mind-boggling number of individual binaries and options and wacky ideas involved. If you can turn on your filters to weed out the things that you don’t need, and start slow and easy, you’ll soon be picking up more as you go along. You dont need to understand everything from the beginning and you should consider picking up git to be more like like learning a programming language, and not something you are going to fully understand in an hour.

Git’s main problem is its complexity. Part of that is because it is actually more powerful than the other systems, but you should ignore those features for now. Part of git’s complexity is that it uses nonstandard names for its most common operations. The rest of the revision control world has settled on basic commands such as ‘checkout’ and ‘revert’, but not git. The final reason git is complicated is the index.

Many are confused by git, this is because git is conceptually different compared to other revision control systems You need to have a correct understanding of the theory of git or your experiences with other systems will lead you astray. But! Lets pretend that doesn’t exist right now, and you can learn about it later. I recommend you spend the first 6 months with git, pretending it is subversion and then think about expanding your understanding.

With that said, do not think of git as a linear evolution from subversion. Subversion came from CVS, which came from RCS, but that progression does not extend into git. Don’t try to map your understanding of subversion onto git, and expect it to make sense, it will confuse you!

Lets get started!

The best way to move to git from svn is to use git-svn, this way the upstream central svn repository stays there, and you just interface with it with git, nobody will know the difference and you can always step back into your cozy svn environment if you need to without having to buy the whole farm all at once. You will start to get familiar with git and soon will be super comfortable with it. I’ll walk you through some git basics below, and then get you started using git-svn.

One quick note… you might see git commands done as ‘git-command’ and also ‘git command’, you should get used to the latter form as the former is being phased out. All git commands have man pages which you can access with ‘git command —help’.

Install git

In Debian/Ubuntu, there is already a package called ‘git’, you want ‘git-core’, and you want to make sure that you are getting version 1.5 or later, believe me this is a major improvement in UI. You will also want the git-svn package, and gitk:

$ sudo aptitude install git-core git-svn gitk
$ git version
git version 1.5.3.2

I’d like to introduce you to git…

….but why don’t you do it yourself! To introduce yourself to git, tell it your name and email address so that when you commit things, they will be properly noted. This configuration is done system-wide and only has to be done once per machine that you are working from:

$ git config --global user.name "Blue Footed Nighthawk"
$ git config --global user.email "bfn@riseup.net"

You can see what this did by doing one of the following:

$ git config --list
$ cat ~/.gitconfig

The ~/.gitconfig file is not stored in a repository, so you will need to set this on other computers if you are going to use git from them.

A couple other good suggestions for your ~/.gitconfig:

[color]
	ui = auto
[alias]
	changes=diff --name-status -r
	rlog = log --pretty=format:\"%h %Cblue%cr%Creset %cn %Cgreen%s%Creset\"
        wdiff = diff --color-words
        wshow = show --color-words
[diff]
       renames = true

[push]
	default = current

This will setup some nice colors when you look at diffs, and setup some branch pushing defaults.

Now that you are setup, lets really get started!

Ok, now we are going to walk through some fundamentals of git. Understanding these will be very useful for what you want to do. Once you are set with a few basics, then we can talk about using git with crabgrass’ subversion sever, tracking other remote repositories and merging them.

I know, I know… thats what you really want to do, not this stuff, but you need to lay the ground floor first, so just walk through this with me. Trust me, you are stuck in a subversion mind-set, and you will be confused if you don’t get the basics of git down first, because there are some fundamental differences. You may be tempted to try different things as we go along, try to resist that urge and instead step along with me here. The reason I ask you to resist this urge is because I’ve planned this out very carefully to logically step through the concepts in an order that makes sense and doesn’t require you to do something complicated before the fundamentals have been explained. This all fits together nicely, but gets complicated quickly if you decide to try things that you think of along the way before I’ve gotten to them. I dont want to discourage you, but instead think its best to that you note these things you want to try somewhere so you can do them later once you have gone through this.

Our first git repository

Lets take a directory with a couple files in it and make it a git repository that we can play with. I highly recommend you follow along by doing these commands yourself in a directory with some files in it. Try to follow along and not get clever and anticipate what is coming because you might end up getting in a weird state and then being confused, just stick with me for a little bit longer, you will have plenty of time to screw around later!

$ cd ~/src/munin-node
$ ls
munin-node.c munin-node.c~ secret_password server2.c  server.c  uh.c  

Initialize the git repository

Now lets use git to initialize a local repository.

$ git init
Initialized empty Git repository in .git/

This created a .git directory and its good to go:

$ ls -al
drwxr-xr-x 3 micah micah  4096 2008-06-19 16:28 ./
drwx------ 5 micah micah  4096 2008-06-10 19:55 ../
drwxr-xr-x 7 micah micah  4096 2008-06-19 16:28 .git/
-rw-r--r-- 1 micah micah 10859 2008-06-11 18:14 munin-node.c
-rw-r--r-- 1 micah micah 10859 2008-06-11 18:14 munin-node.c~
-rw-r--r-- 1 micah micah   108 2008-06-11 18:14 secret_password
-rw-r--r-- 1 micah micah  5342 2006-05-12 12:53 server2.c
-rw-r--r-- 1 micah micah  1363 2006-05-12 12:53 server.c
-rw-r--r-- 1 micah micah   357 2008-06-11 17:38 uh.c

If you want, poke around in the .git directory, its just flat files, and you shouldn’t be afraid of them. There is a repository config file in there, hooks, and other stuff.

Ignore some things

There are a few files in this directory that we don’t want to ever include in the repository, so we setup git to ignore them

$ cat >> .gitignore
secret_password
*~
*.logdb/schema.rb
tmp/*/*
log/*
config/database.yaml
config/environments/production.rb
db/schema.sql
*.[oa]
EOF

The patterns put in this file will be applied whenever git is doing things to add or track content and will ignore files that match. The .gitignore file stays with the project and is checked into the project (there is a personal exclusion list you can make in your .git directory if you want), you can also have a .gitignore in any subdirectory if you want and it will apply to subdirectories below it. The lines I added above are some common ones that you might find in a .gitignore file (many of those are related to a rails environment)

Add things to the repository

$ git add .
$

This adds all the files in the current directory to git’s staging area (except for the things that are listed in the .gitignore), but not to your local repository.

Wait, what?

When using git, you actually are working with three different things: your working directory, your index, and finally your repository. When you run git add, what git does is take changes in your working directory and ‘stages’ them in the index. It does not actually add them to your repository, they are just staged. Because the term ‘index’ is somewhat unclear, people will often refer to the index as the ‘staging area’.

We can see the files that are queued up in the staging area, that are going to be committed to the repository by issuing git-status:

$ git status
# On branch master
#
# Initial commit
#
# Changes to be committed:
#   (use "git rm --cached <file>..." to unstage)
#
#	new file: munin-node.c
#	new file: server.c
#	new file: server2.c
#	new file: uh.c
#

This shows that these four files are ‘staged’ in the index. They have been added, but not committed to the local repository.

Now commit to the repository

You can only commit to the repository things that you have first added. When you do a commit, git will take everything you have staged with ‘git add’ and actually add it into the repository.

$ git commit -m "initial check-in" 
Created initial commit d410a9c: initial checkin
 5 files changed, 648 insertions(+), 0 deletions(-)
 create mode 100644 munin-node.c
 create mode 100644 server.c
 create mode 100644 server2.c
 create mode 100644 uh.c
 create mode 100644 .gitignore

Now these files are committed to your repository! You will notice that the .gitignore is also included.

Content not files

So you may have noticed that you had to first add files, and then commit them. You actually have to do this every time you make a change in git. Thats because git works with content, not files or changes. When you change a file git will notice, but you need to add the changed file to what is called the ‘staging area’ or ‘index’ before you can commit it. The staging area is just a layer between your current working repository and the actual commit that has been made. Its where you stage commits before you commit them.

Lets take an example, we will make a change to a file in our example repository and then see what git sees:

$ echo >> uh.c
$ git status
# On branch master
# Changed but not updated:
#   (use "git add <file>..." to update what will be committed)
#
#	modified:   uh.c
#
no changes added to commit (use "git add" and/or "git commit -a")

See how it sees that the file was changed, but it says it needs to be added before it can be committed. The file has been changed in our working directory, but we have not staged it yet. So lets do that by running ‘git add’ which will actually stage it so the next commit will include this file:

$ git add uh.c
$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#	modified:   uh.c
#

Now the file shows up under ‘Changes to be committed". If you run ’git commit’ at this point, it will commit this change into the local repository.

This difference between the staging area/index and the actual commit is good to know, but since you are getting started, you can ignore all of this and just use ‘git commit -a’ for a while. This will take any changes in the entire project and add them, and then do a commit. If you do ‘git commit -av’ you will get all the changes in the entire project, add them and bring up a commit editor with a diff of all the content that has changed, so you can look through what you did and write a better commit message.

What has happened?

If you want to see a log of all the commits in a repository, you can do ‘git log’. The ‘git log’ command will give you the commit history of your repository:

$ git log
commit d410a9c2e9769bdb2771a8f404c87f46d11aa663
Author: Micah Anderson <micah@riseup.net>
Date:   Thu Jun 19 16:35:56 2008 -0400

    initial checkin

This shows us the commit message about what has changed, who changed it and a globally unique SHA1 hash that identifies each commit. This hash is useful because you can use it as a reference to do various things.

If you do ‘git-log -p’, you will see the diff of the actual changes to all the files.

Branching

Branching is one of the more powerful and easy things that git can do. Why do you want to branch at all?

Sometimes you want to work on a new feature, but you want to keep the main master branch pristine, while you work on this new experimental feature. Git enables you to split off easily. Sometimes you want to track remote branches and pull changes from them. Maybe two other people are working on similar code and you want to do a 3-way merge between you and their repositories, create a new branch and then merge the two other branches in, that way if there are complications, they are isolated in your working merge branch.

The master branch

In everything we’ve done so far, we have been working in the master branch, you can see what branches you have setup and which one you are currently in by doing the following:

$ git branch
* master
$

This shows you that you have one branch, “master”, and the asterisk next to it shows that you are currently in that branch (if you want fancy: git branch —color).

Visualizing branches

Lets fire up a useful tool for visualizing branches, we dont have anything complicated yet, but this tool will enable us to visually understand what happens with branching in git. You will need to launch this inside your git repository to see anything.

$ gitk

This will fire up a relatively ugly gtk application. If this looks really horrible and the fonts are unreadable, edit your ~./gitk and change the mainfont to Monaco 12 point, or some other non-variably spaced font. I’ve attached my ~/.gitk to this document if you want to try to use it.

So have a look around in gitk, the top left is a tree view of your current repository’s commits in time, on the right side are the individual people who made the commits, below is the unique SHA1 hash associated with the commit. Some search boxes, the actual changes respresented in diff format, and then the files on the right side.

You fired up ‘gitk’ in the master branch, this will show you only the changesets in this particular branch, if you are wanting to see changes in all your branches you will fire up git with ‘gitk —all’, as we will do below when we get into it.

Ok, quit out of gitk for now, we are going to do various branch operations and then fire up gitk again afterwards each so you can visualize what is going on.

If you are interested in trying a nicer looking gitk, there is a program called ‘giggle’ which is like gitk but uses gtk+, its much more visually appealing, but its pretty new and might not represent things as accurately as gitk, but its worth looking at because its prettier.

Creating and switching to other branches

Lets create a new branch!

$ git branch bird

That was easy! Lets see what branches we have available:

$ git branch
* master
bird
$

So you see now that we have two branches, and we are in the master branch. Super simple! Have a look in gitk —all to see how the new branch looks. See how the ‘bird’ branch and the ‘master’ branch are green blocks on the same level at the top? Good, thats going to start changing. Quit out of gitk.

You are in the master branch until you check out the new branch, so lets go ahead and do that:

$ git checkout bird
$ git branch
* bird
master

See how the asterisk is now on the ‘bird’ branch? If you look around, you will see all the files look the same as before, thats because you branched off of your master, and nothing has changed.

Lets make some branch changes!

I thought you might want to. Make some changes in a file in this new branch, lets first add a new file:

$ echo "some ruby code" > whatever.rb
$ git-add whatever.rb
$ git-commit -m 'added some changes'

Now if you fire up gitk —all and you will see something interesting… the ‘bird’ branch is one step above the master branch… iiiiinnnnteresting. If you add more files to the ‘bird’ branch will make it even more clearly divergent.

So you added this whatever.rb file in the ‘biurd’ branch. I want to illustrate something to you, so please switch back to the master branch now:

$ git checkout master
$ ls
munin-node.c  server2.c  server.c  uh.c

Notice that the whatever.rb file is gone? Thats because this file was part of the ‘bird’ branch, and not part of the master branch. When you switch between branches, git updates your current working directory with all the files, changes, etc. of the branch. Sweet!

So go ahead and make some changes to some files in the master branch, and run ‘git commit -av’ to commit them to this branch. Once you have done this, fire up ‘gitk —all’ again (you can also keep gitk running and instead pull down the File menu and choose Update to get the latest changes). Whoaaaa interesting, the ‘bird’ branch went off one way, and the master continued on, thats funky! Just you wait, it gets wild.

Keeping branches in sync

Now there are two separate code-bases here, eventually we are going to want to merge these back at some point, but hold your horsies, lets do something else. What if you wanted to keep the ‘bird’ branch in sync with any changes that you make in the master branch, but continue with the new changes you are making in the ‘bird’ branch (a real-world example might be you want to create a rails2.0 branch that you want to work on porting the code to rails2.0, but you want to also continue doing development in the master branch on bug fixes, features etc. but you want to make sure the rails2.0 branch stays up with these changes).

So lets do this… if you have been following along, in the previous step you made some changes in master, but not in the ‘bird’ branch. What you want to do is step over to the ‘bird’ branch and then do an operation called a ‘rebase’ against the master branch:

$ git checkout bird
$ git rebase master
First, rewinding head to replay your work on top of it...
HEAD is now at d6fdcd7 make some changes
Applying added some whatever

What was that!? First what this does is to extract all the changes that you did in the ‘bird’ branch, and put them aside. Then it brought the ‘bird’ branch up to date with the ‘master’ branch, and then it took the changes that were previously set aside, and then replayed these changes on top. Look in ‘gitk —all’ now and see how the branches are no longer diverged, but instead bird is just continuing on (this is because both branches are the same now, except ‘bird’ has a couple additional changes added ontop).

Alright, that was neat!

But wait, don’t get trigger happy with ‘git rebase’. You should repeat after me, “never use git rebase on repositories with more than one user, or repositories that I have published to the world”. Why? Well, the “git rebase” manpage says why, “When you rebase a branch, you are changing its history in a way that will cause problems for anyone who already has a copy of the branch in their repository and tries to pull updates from you.” So “git rebase” should only, and I mean only, be used in situations where you maintain a private branch of a project, you never share it in any way, (except to submit patches to upstream). If you are working with a team on maintaining a branch, or want to post this branch online for others to pull, you do not want to use git-rebase!

So you should only use git rebase when you are working on your private rails2.0 migration branch that you want to keep up-to-date with your ‘master’ branch changes, as long as you aren’t publishing this rails2.0 branch. When you are ready to publish that branch, you would merge it into your master branch, and then publish that.

Merging branches

Now lets say all the changes that you were doing in the ‘bird’ branch are done, and you want to merge these back into the ‘master’ branch (say you finished the rails2.0 upgrade in ‘bird’). All you need to do is do a branch merge. Lets step through this. First go back to your master branch:

$ git checkout master
$ git branch
bird
* master

Then check out what changes you are about to merge back into the ‘master’ branch by doing a branch diff:

$ git diff master bird

This shows the changes in bird that would be added to master in a diff format. Now you can look and see what changes will happen and see what you are about to merge before doing it.

Now do the merge (you need to be in the master branch at this point):

$ git merge bird

This merges the bird branch into the master branch, shows you a nice diff stat graph with the merges. Fire up gitk —all and see how the branches joined back together.

Conflicts

What if there was a conflict with this merge? How do you resolve the conflict in a merge? When you did the branch merge above, you will see some nasty conflict messages on the screen. If you open the files that have conflicts, you will see conflict markers in the files, you can resolve them by making the changes that you want in the files until it is how it should be. Then save the file, and then add the file to the staging area, and then commit it.

If you didn’t have a conflict, you can simulate one now to practice this. I did this by editing the top line in munin-node.c while in the ‘bird’ branch, I changed the comment so that the word ‘simple’ was changed to ‘complicated’. I then ran ‘git commit -a’ to commit it to my ‘bird’ branch, then I switched to the master branch, I edited the same file and added the words ‘not so’ in front of the word ‘simple’ in the comment. Then I committed this change. Then I attempted to do a branch merge:

$ git merge bird
Auto-merged munin-node.c
CONFLICT (content): Merge conflict in munin-node.c
Automatic merge failed; fix conflicts and then commit the result.

If I open the munin-node.c file in an editor, I will see:

<<<<<<< HEAD:munin-node.c
/* A not so simple munin re-write in C */
=======
/* A complicated munin re-write in C */
>>>>>>> bird:munin-node.c

I like the second one more, so I delete the top conflict marker, the comment line, the conflict separator, left the version I preferred, and then removed the bottom conflict marker. Then I saved the file and told git about it and then committed the merge:

$ git add munin-node.c
$ git commit
Merge branch 'bird'

Conflicts:

        munin-node.c
#
# It looks like you may be committing a MERGE.
# If this is not correct, please remove the file
#       /home/micah/src/munin-node/.git/MERGE_HEAD
# and try again.
#

Its interesting to note here that once I did the commit, I committed the merge itself, this is called a merge commit. If there were no conflicts, this merge commit would have been done automatically and you would see it in the git log. In this case, I had to resolve the conflict, add the file, and then do the merge commit by hand, git filled out the commit message for me here.

Branch changes travel

Something worth knowing is that uncommitted changes will get carried over to branches when you create new branches or switch to existing branches. To illustrate this, make a change in the master branch, but dont commit it to the repository, then switch to the ‘bird’ branch:

$ echo >> munin-node.c
$ git checkout bird
M	munin-node.c
Switched to branch "bird"

See how the change was moved to the ‘bird’ branch? So changes that you make in a repository but have not committed will go to the new branch. Why do this? Lets say you are halfway through some changes and realize that these changes are larger than you thought, so maybe you should have created a branch and done these changes there, but you’ve already made the changes. This enables you to do this easily, in fact you can create a new branch and switch to it, all in one go:

$ git checkout -b 'tweet'
M	munin-node.c
Switched to a new branch "tweet"

Deleting branches

You made all these branches, but you dont need them anymore! In fact, you merged the ‘bird’ branch with the ‘master’ branch before, because you were done with the rails2.0 migration, so you dont need it around anymore:

$ git branch -d bird

If you look at gitk, you will see that the branch doesn’t exist anymore.

If you try to delete an unmerged branch, it will complain, so you will need to use ‘git branch -D bird’ to force it.

Playing well with others

Git can use a number of protocols for interacting remotely, both sending your repository commits out to the public, and receiving others. Git works with ssh://, http(s)://, git://, file:///, rsync://. Typically when you push your repository to make it public, you will use ssh://. When you obtain other people’s repositories and changesets, you will typically use the git protocol or the http protocol. The git protocol is the most efficient mechanism to use, and you should prefer it when obtaining remote repositories.

Publishing your repository

There are a lot of ways to publish your git repository. For now lets pick the easiest way to start with. We will use a site that is a public git hosting site where you can create projects and publish them for free. We are going to use this as an example to get you familiar, and you can decide if you want to keep using it in the future, or decide to publish your own repository manually, use github (free until a certain point) or some other service.

First register as a user, and then register your project, select “Push mode” and be sure to put a valid URL for the project homepage (even if you don’t have one, you need a valid URL here).

Once you have created your project at repo.or.cz, we just need to add ourselves as a user that can push to the project. Click the ‘assign users’ link that you will see after you’ve created the repository and fill it out.

can now go back to our munin-node project and add this repository as a remote and then push to it:

$ git remote add public git+ssh://repo.or.cz/srv/git/munin-test.git
$ git push public master
Counting objects: 26, done.
Compressing objects: 100% (25/25), done.
Writing objects: 100% (26/26), 6.61 KiB, done.
Total 26 (delta 13), reused 0 (delta 0)
To git+ssh://repo.or.cz/srv/git/munin-test.git
 * [new branch]      master -> master

This has pushed our ‘master’ branch to the ‘public’ remote. In that process, the remote master branch was created (thats what you see the * new-branch) out of our master branch.

If you run ‘git remote’ you will see now that there is one remote called ‘public’, which is the name that we gave it. We can show more information about that remote by showing it:

$ git remote
public
$ git remote show public
* remote public
  URL: git+ssh://repo.or.cz/srv/git/munin-test.git
  Tracked remote branch
    master

If you look at repo.or.cz you can see the files were pushed in to the project and all the information is available, including the history, etc. You will also see that there is a clone URL available that you can give to people to clone this repository.

Obtaining other people’s repositories

There are two different situations where you might want someone else’s repository. The first is you just want to check out some code that the upstream developers publish over git. You might want to just get it so you can install the latest snapshot, poke around in the source, or get started developing on that code. The other situation is where you and other people are working on a similar code-base, and you want to collaborate with each other.

In the first case, a typical retrieval of a remote git repository is done using a clone command.

For example, this will retrieve the munin-node repository that I pushed to repo.or.cz:

$ cd /tmp
$ git clone git://repo.or.cz/munin-test.git
Initialized empty Git repository in /tmp/munin-test/.git/
remote: Counting objects: 26, done.
remote: Compressing objects: 100% (12/12), done.
remote: Total 26remote:  (delta 13), reused 26 (delta 13)
Receiving objects: 100% (26/26), 6.38 KiB, done.
Resolving deltas: 100% (13/13), done.

This initializes a local copy of the remote repository into the directory munin-test.git. You have a git repository that is populated with all the commits, history etc. You can go in there and start branching the code and working from there as your local repository, making whatever changes you want.

The second situation is where you already have your own git repository of some code, and someone else has also published their own git repository and you want to share things between your two repositories (merging etc.). In this case, you are going to want to use the ‘git remote’ commands to encapsulate their repository within yours, rather than cloning a separate one into a separate directory as above.

To do that you would already be in your local git repository and then you would add a remote git repository, and then update its references:

$ cd my-local-git-repo
$ git remote add micah git://repo.or.cz/munin-test.git
$ git remote update                    
Updating micah
 * [new branch]      master     -> micah/master

Now you have within your repository the remote repository associated with this remote, you can checkout its branches, merge between branches, etc.

Working with other people’s repositories

If you have added other people’s remote repositories to your local repository through ‘git remote add’, you can now do things such as:

Workflows

Lets first talk about individual workflow, and then talk about different models for working with others.

Individual workflow

You can do whatever you like with your local git repository, but this is one way that a lot of people use, which might be considered somewhat of a convention, but its all up to you how you want to do things.

First you have your ‘master’ branch which is where you maintain the stable version of things, this is what you will make public to the world (consider this your ‘production’ branch). Then you have a development branch (consider this to be your ‘trunk’) where you test out stuff, you integrate various branches into this development branch, its where you hack. When this development branch stabilizes with the work you have been doing on it, you merge that into your master branch. You also will want to create short lived branches to work on specific ideas, bugs, new features etc. where you will hack on those particular things (these are known as a topic branches). As soon as that particular topic branch is stabilized, you merge it back into your development branch.

This enables you to go off and make forward progress on different types of work, but if you suddenly realize that there is a fix that needs to be done on the production install, you can switch back to your master, make that fix and then publish it. Without having to carry all the baggage with you of the development work you are doing.

Working with others

There are a number of different workflow models that people use when sharing repositories with other developers, here are a few of those:

Resources

I recommend the following git information as next steps to broaden your understand, some of these are really really good at providing you with some of the fundamental concepts of git that will make you better understand what is going on.