Sunday, June 19, 2011

SVN Tutorial fo Unix


This tutorial is meant to be read linearly so that it introduces the important notions gently.

Local repository

We first consider the situations where the repository is on the machine you are working on, that is it is accessible through the filesystem.

Creating the repository

Run the command :
svnadmin create --fs-type fsfs /home/user/svn
This creates a bunch of files and directories in /home/user/svn. You can go there and do a ls but it won't be very explicit to you. Actually, you should not look at home/user/svn as a regular directory but rather as a virtualone whose content is far different from the "real" one displayed by ls. To see this virtual directory, type :
svn ls file:///home/user/svn
This will return nothing because, well, we have not put anything into it right now! We will see later how to add things. For the moment, look at the command we have typed. We have used the standard ls command but we have used it as a svn command. This is an important aspect of svn : it allows you to manipulate files and directory (i.e. create, delete, move) with commands similar to the standard ones. So you really have the feeling you are just working with files but this is just the way svn presents them to you. Internally those files are managed with a database that is stored in the files you saw when listing the /home/user/svn/ directory. This virtual directory structure can be anywhere : on the local filesystem, on a remote machine, or even on a web server. For that reason, instead of talking of "filepath" for these virtual directories, svn uses the terminology URL. The URL must be prefixed to indicate how the repository should be accessed. That's why we have the file:// prefix. We will see other examples later. For the moment let's play with the virtual directory structure.

The virtual directory structure

.Just so you can get a feeling of how comfortable the svn approach is, just try the following lines.
svn mkdir file:///home/user/svn/foo -m "Created dumb directory"

Committed revision 1.
svn ls file:///home/user/svn
foo
svn rm file:///home/user/svn/foo -m "Removed dumb directory"

Committed revision 2.
svn ls file:///home/user/svn
Is'nt that great? But there is more. Even if the repository now looks empty after we rm-ed the foo subdirectory, this directory is not "lost". Indeed, svn keeps track of all changes in an intelligent manner (i.e. doing cheap copies for example, by saving only deltas) as you can see with
svn log file:///home/user/svn
------------------------------------------------------------------------
r2 | user | 2005-09-29 13:34:03 +0200 (Thu, 29 Sep 2005) | 1 line

Removed dumb directory
------------------------------------------------------------------------
r1 | user | 2005-09-29 13:33:33 +0200 (Thu, 29 Sep 2005) | 1 line

Created dumb directory
We will come back on this later on when discussing versioning. Now that we are familiar with the virtual directory, let's see how to concretely put a project under version control.

Importing an existing project

Let's suppose you have an existing project with a bunch of files in /path/to/project/. For example :
cd /path/to/project
tree -a
.
|-- bin
|   `-- main
|-- doc
|   `-- index.html
`-- src
    |-- Makefile
    `-- main.cpp

3 directories, 4 files
To place it under version control with svn (i.e. to import the project), the first thing is to clean all files that are not meant to be version-controlled such as compiled code, binary executables, etc.
rm -f bin/main
Then we do the import with the following command :
svn import /path/to/project/ file:///home/user/svn/project/trunk -m 'Initial import'

Adding         /path/to/project/trunk
Adding         /path/to/project/trunk/doc
Adding         /path/to/project/trunk/doc/index.html
Adding         /path/to/project/trunk/src
Adding         /path/to/project/trunk/src/main.cpp
Adding         /path/to/project/trunk/src/Makefile
Adding         /path/to/project/trunk/bin

Committed revision 3.
A couple remarks. The -m flag is used to give a log message for the action. Log messages are enforced by svn. If you do not want to use the -m flag on the command line, setup the SVN_EDITOR environment variable, for example to vi, and svn will prompt you with that editor anytime it needs a log message. Besides the log message, the structure of the import command is quite simple. You give the path to the unversioned files and then the path under which it will be known to svn. The first path refers to the local filesystem and can be an absolute path. Note that this means you do not have to be in the directory you want to import to run that command (which was the case for cvs). The second path is indicating where (or "under which name" if you prefer) the project will be held in the repository. Note that we have appended trunk/ to it. We will explain why in a moment. Let's first see how we now work with our project. Indeed, after the import the directory we "copied" is still not under version control.

Checking out a project

To checkout the project, type :
cd /path/to/project
cd ..
rm -rf project
svn checkout file:///home/user/svn/project
A  project/trunk
A  project/trunk
A  project/trunk/doc
A  project/trunk/doc/index.html
A  project/trunk/src
A  project/trunk/src/main.cpp
A  project/trunk/src/Makefile
A  project/trunk/bin
Checked out revision 3.
After that command, a version-controlled copy of the project is now available in /path/to/project. This can be verified by the presence of a .svn directory in there.
cd project
ls -a
./ ../ .svn/ trunk/
This can also be checked by the command svn info
svn info
Path: .
URL: file:///home/user/svn/project
Repository UUID: 506be1b8-fe01-0410-aa7c-9527c26032d2
Revision: 3
Node Kind: directory
Schedule: normal
Last Changed Author: user
Last Changed Rev: 3
Last Changed Date: 2005-09-30 17:17:42 +0200 (Fri, 30 Sep 2005)

Working with the project (part 1: editing and adding files)

Now that our project is safely under version controlled, we can start making it evolve. First, we add another two files defining a class. Then we change main.cpp to use that class and Makefile to compile them. Use your favorite editor (no flame here). After that, we can check the status of our project :
cd /path/to/project/trunk/
svn status
?      src/class.cpp
?      src/class.h
M      src/main.cpp
M      src/Makefile
?      bin/main

Note first that the output of Svn status is very human readable (CVS users know what I mean). We can clearly see that there are 3 unknown files (?) and two modified files (M). Let's add the two files that we indeed want to version control :
cd /path/to/project/trunk/
svn add src/class.h src/class.cpp
A         src/class.h
A         src/class.cpp

You can redo a svn status if you want but I am sure you already know what the output will look like. For the two files we added, the question mark will now be a A. Now that you have done these modifications, that you have checked everyting compiles, you decide to commit these changes :
svn commit -m 'Use a class to print hello world'
Sending        trunk/src/Makefile
Adding         trunk/src/class.cpp
Adding         trunk/src/class.h
Sending        trunk/src/main.cpp
Transmitting file data ....
Committed revision 4.

Once again, note the clarity and conciseness of the output. New files have been added, modified files have been send. This latter terminology is because svn can access different types of repository and in particular through the network. Moreover, as we have seen, behind svn is a database and the transaction with that database is better described with send/receive terminology. Well, let's not disgress and let's focus on the really important thing : the last sentence of the output.

Versioning : the svn way

When you committed your changes, svn indicated that it was version 4. Where that number comes from? Indeed, it is the very first commit so you would expect the revision to be 2, or 1.1 or something of that taste. Well, svn has a very different way of numbering files than CVS and it is one of its most pleasant feature. Everytime you change something (create a directory, remove a directory, commit modified or added files), svn attributes a new version number to all files. Since we have been mkdir-ing then rm-dir the foo dir, then import-ing then commit-ting, the current number is thus 4. Everything happens as if the whole repository was copied everytime you do an svn command and svn enforces thinking that way. Of course, this is not what happens and svn saves only the relative changes in order to minimize the size of the database.
Contrary to CVS, the version number of each file is thus increased when you make changes. For example, the doc/index.html was not changed when we did the commit. However, this file exists in revision 4. This ensures a better coherency amongst files and really ease the retrieval of a given version of your project. Let's see an example. Let's go to the /tmp directory and type :
svn checkout -r 4 file:///home/user/svn project
A  project/project
A  project/project/trunk
A  project/project/trunk/doc
A  project/project/trunk/doc/index.html
A  project/project/trunk/src
A  project/project/trunk/src/main.cpp
A  project/project/trunk/src/class.cpp
A  project/project/trunk/src/class.h
A  project/project/trunk/src/Makefile
A  project/project/trunk/bin
Checked out revision 4.

The meaning of -r 4 should be self-explanatory. Look at the output. As you can see, this retrieved all files of your project even the doc/index.html one. In other words, the version number is just a shortcut to the date at which the project is stamped. Let's now get a copy of the previous version :
cd /tmp/
svn checkout -r 3 file:///home/user/svn project
U  project/project/trunk/src/main.cpp
U  project/project/trunk/src/Makefile
D  project/project/trunk/src/class.cpp
D  project/project/trunk/src/class.h
Checked out revision 3.

Have a closer look at the output. It shows how clever svn is. We asked for version 3 but because of our previous command, the version 4 is already in /tmp. No problem, svn does all the necessary things and indicate them : the main.cpp and Makefile are updated and the class.{h,cpp} files are deleted. At that stage, I must explain a point that puzzled me at first but maked sense afterwards. Let's go back to /path/to/project and get some info on the files.
cd /path/to/project
svn info trunk/src/main.cpp
Path: trunk/src/main.cpp
Name: main.cpp
URL: file:///home/user/svn/project/trunk/src/main.cpp
Revision: 4

svn info trunk/doc/index.html
Path: trunk/doc/index.html
Name: index.html
URL: file:///home/user/svn/project/trunk/doc/index.html
Revision: 3

How comes svn indicates that version for index.html is 3 while we just said (and checked) that the version of all files was increased to 4. Well, the point is that index.html exists in version 4 but it was last changed in version 3 and that what's svn is indicating in its output to the info command. Anyway, you might start thinking that this is confusing : if you want to get the current version of the project, you have to svn info all files and keep the maximum version number returned! But that's because you have not yet assimilated the svn way. Indeed, you need not knowing the version number. It is just an indication and you should not base your tracking of the versions on it. This is what we will see now.

Tags, trunk (and branches)

Remember that we have created a trunk subdirectory when importing the project. The reason will become clear now. A project under version control has a typical life cycle. You develop it, comitting regularly the changes when they are validated. Then you arrive at a first "deliverable" version. It is time for the first release for which you come up with a name (XP, longhorn, vista ?). Then you start working again on it until you arrive to the second release and so on. For svn, the current development of the project (the up-to-date version) is called the trunk or HEAD. The releases are called tags. You can think of them as tagging with a (clever) name the project at a given state. The good thing with svn is that it allows you to handle this very easily and naturally. Let's see an example.

Our hello world example is working. However, we are currently using a custom Makefile and we would instead like to use the qmake tool that comes with TrollTech's Qt. But we would like to retain the Qt-free version. For that, we are going to create a tag. Since we are likely to create different tag (=release) of our project in the future, we start by creating a subdirectory to "hold" the tagged version :
cd /path/to/project
svn mkdir tags
A         tags
Then we simply make a copy of the trunk to the tags directory :
svn copy trunk/ tags/before-qt
A         tags/before-qt
Then we simply commit this copy.
svn commit -m "Tagged version before switching to Qt" tags/
Adding         tags
Adding         tags/before-qt
Adding         tags/before-qt/src/Makefile
Adding         tags/before-qt/src/class.cpp
Adding         tags/before-qt/src/class.h
Adding         tags/before-qt/src/main.cpp

Committed revision 5.
Everything looks like we have made a copy of the trunk/working copy under a new name. An indeed, in the tags directory, we do have a copy of all the files. But internally, in the svn's database, no copy occurred and svn just remembered that before-qt refers to the project at at a given time. Now, we can choose to keep the tags/before-qt directory so at any time in the future, we can go there and compile this release. We can also delete it since it is now in the database and we can retrieve it easily. To be convinced, do :
cd /tmp/
svn checkout file:///home/user/svn/project/tags/before-qt
A  before-qt/doc
A  before-qt/doc/index.html
A  before-qt/src
A  before-qt/src/main.cpp
A  before-qt/src/class.cpp
A  before-qt/src/class.h
A  before-qt/src/Makefile
A  before-qt/bin
Checked out revision 5.
There is another possible state for a project in its lifecycle. It happens when a team starts working on a different evolution of the project while another team keeps working on the current release. For example, a team starts investigating a partial rewriting of the project while another team works to deliver a bug-fixes release. In version control terminology, it is called branching. A branch occurs when the trunk is split and different versions start to leave their own life. Of course, svn provides all the tools for hanling branches but I will not discuss them as it is far beyond the scope of a gentle tutorial.

Working with the project (part 2: deleting, renaming)

Now that we have a tagged version marking the state before Qt, we can work our new version. the qmake tool works by parsing a platform-independent .pro file describing what to compile and then generating a platform-specific Makefile. Thus we must add a file ssrc.pro. We must also remove Makefile from version control since from now on it will be generated automatically.
cd /path/to/project/trunk/src
svn rm Makefile
D         Makefile
svn add src.pro
A         src.pro
Then we decide that the doc was not a good name and that we want to rename it html. Nothing more simple :
cd ..
svn rename html
A         html
D         doc/index.html
D         doc
Now we just have to commit our changes :
svn commit -m 'Switched to qmake. Renamed doc -> html'
Deleting       trunk/doc
Adding         trunk/html
Deleting       trunk/src/Makefile
Adding         trunk/src/src.pro
Transmitting file data .
Committed revision 6.

Other commands

There are other commands for svn. You can type svn help to list them and type svn help command to get help on a particular command. The one you will certainly use are :
  • revert undoes all changes under the current directory;
  • update fetches all changes that were committed from another working directory;
  • diff displays the differences between the files under the current directory and the last version of these files that were committed.

Remote repository

Version control is not only necessary to avoid loosing important information but also to work on a project from different places. The typical situation I encouter is working on a project either at the lab on my desktop machine or at home on my laptop (although I should not be working at home ;-)). When I am on my laptop, the svn repository is not accessible through the filesystem. It is on my account in the lab because this account is backuped everynight. Well, svn can handle this very simply thanks to the snvserve program. I won't detail it as it is a powerful tool with advanced access control to the repository. Instead, I'll just give the simplest (dumbest) usage which simply tells svn to use ssh standard identification.
From my laptop at home, I can access the repository by just changing the URL file:///home/user/svn into svn+ssh://url.of.desktop/home/user/svn.
svn checkout svn+ssh://url.of.desktop/home/user/svn project
And it worked straightforward!

Important note!

I should not tell you this but a couple of days after I wrote that tutorial (which I wrote on the very day I learned svn as a way to check my understanding), I was a bout to drop svn! Indeed, I was using it between my laptop and desktop, through the network and my database was getting regularly corrupted giving me the daunting and frightening message :
svn: Berkeley DB error while opening environment for filesystem /var/svn/db:
Invalid argument
svn: bdb: Program version 4.2 doesn't match environment version
The first time, I really thought I completely lost the data. Hopefully, I was able to recover it using :
svnadmin recover /home/user/svn
And then svn would work a couple more time before the problem would show up again. Very annoying. I finally did what was necessary: I googled the web! Yann Rocq gives a solution there. It is in french so I'll translate. The problem seems to be that the Berkeley DB format used by svn does not like being placed on a shared disk, which was my case (the repository was on my account at work, which is NFS-accessed). Database format, huh? Did I forget to mention that you can select the type of database svn uses? Well, remember the --fs-type fsfs flag on the very first command I had you typed in this tutorial. Without it, svn would have used the default format which is berkeley DB and you would have been stuck. Conversely, fsfs is a alternate db format proposed by svn. To get some element of comparison, read this propaganda document.
To conclude this note, let's mention that if you have unfortunately created a base with Berkeley format, you can export/re-import it as described in the Dump/load section.

Various tricks

The tcsh completion assistance

Programmable completion is a under-used feature of tcsh. Search the man page of tcsh for the world complete to get more info. The idea is to tell what kind of completion a command expect for the different possible arguments. I made a simple though useful completion list for svn. Just copy the code below in a ~/.completerc file (modify the svnhosts list to match those server you mainly use) and add source ~/.completerc in your.tcshrc (Thanks to Peter Beckman for improvements and sugggestions).
set svnhosts="(origan.inrialpes.fr)"
set svncmds="(checkout commit update status info add diff help revert)"
complete svn \
 p/1/"$svncmds"/ \
 c@file://@d:*@ \
 c@*svn://@"$svnhosts"@/ \
 n@checkout@"(file:/ svn:/ ssh+svn:/)"@/ \
 n/commit/"(-m)"/ \
 n/--diff-cmd/"(diff xxdiff-subversion)"/ \
 c/-/"(-diff-cmd)"/ \
 n/help/"$svncmds"/ 
Open a new shell. Now type svn ch, then press tab (or Ctrl-D), then f then tab again and be amazed!

Using xxdiff for graphical differences

By default, the command svn diff outputs something that is not really human readable (at leat for me!). Fortunately, there is a wonderful tool called xxdiff that gives a better display. And good news, it is interfaceable with svn! Just add the command line option :
svn diff --diff-cmd xxdiff-subversion
or if you prefer to use it all the time, just add to your ~/.subversion/config the following lines :
diff-cmd = xxdiff-subversion
diff3-cmd = xxdiff-subversion
If you have some trouble installing the xxdiff tool (especially on Fedora Core 4), here is a rpm I did that contains the helper scripts and patches a bug in standard rpms of xxdiff. You must first install tmake and then rebuild the package and install it. First download them :
Then run the commands (as root) :
rpm -Uvh tmake-1.8-1.noarch.rpm
rpmbuild -bb xxdiff-3.1-1fc4.src.rpm
rpm -Uvh /usr/src/redhat/RPMS/i386/xxdiff-3.1-1fc4.i386.rpm

The svnlook utility

As I mentionned earlier, all files increase their version number together. However, Svn info afile indicates you the version number at which afile was last changed . To get the current version number of the project, use ;
svnlook youngest /home/user/svn

Dump/load of a database

What if you one day decide to move your database from a place to another place? How can you do that? If you simply move the directory, you might run into problems. The proper solution is to dump the db :
svnadmin dump /home/user/svn > /tmp/mydumpfile.db
* Dumped revision 0.
* Dumped revision 1.
* Dumped revision 2.
* Dumped revision 3.
* Dumped revision 4.
* Dumped revision 5.
* Dumped revision 6.
* Dumped revision 7.
* Dumped revision 8.
...
Then to re-create another database somewhere else and load the database
svnadmin create -fs-type fsfs /home/user/newsvn
svnadmin load /home/user/newsvn/ < /tmp/dumpfile.db
Then you should see svn re-creating the whole history of the old database in the new database. This may take some time if you have a long history. There is final step that you can optionnaly perform. Normally, you should have committed every change in your db before your migrate it using dump/load. Thus, you can now delete any checked out copy and re-check it out from the new repository. This can be tedious, and you may also have forgotten to commit. In that case, you can use the svn switch command. See its documentation (svn help switch) for usage info.