In this tutorial I will describe how to set up a highly available NFS server that can be used as storage solution for other high-availability services like, for example, a cluster of web servers that are being loadbalanced. If you have a web server cluster with two or more nodes that serve the same web site(s), than these nodes must access the same pool of data so that every node serves the same data, no matter if the loadbalancer directs the user to node 1 or node n. This can be achieved with an NFS share on an NFS server that all web server nodes (the NFS clients) can access.
As we do not want the NFS server to become another "Single Point of Failure", we have to make it highly available. In fact, in this tutorial I will create two NFS servers that mirror their data to each other in realtime using DRBD and that monitor each other using heartbeat, and if one NFS server fails, the other takes over silently. To the outside (e.g. the web server nodes) these two NFS servers will appear as a single NFS server.
In this setup I will use Debian Sarge (3.1) for the two NFS servers as well as for the NFS client (which represents a node of the web server cluster).
I want to say first that this is not the only way of setting up such a system. There are many ways of achieving this goal but this is the way I take. I do not issue any guarantee that this will work for you!
1 My Setup
In this document I use the following systems:- NFS server 1: server1.example.com, IP address:192.168.0.172; I will refer to this one asserver1.
- NFS server 2: server2.example.com, IP address:192.168.0.173; I will refer to this one asserver2.
- Virtual IP address: I use 192.168.0.174 as the virtual IP address that represents the NFS cluster to the outside.
- NFS client (e.g. a node from the web server cluster): client.example.com, IP address:192.168.0.100; I will refer to the NFS client asclient.
- The /data directory will be mirrored by DRBD between server1 and server2. It will contain the NFS share /data/export.
2 Basic Installation Of server1 and server2
First we set up two basic Debian systems for server1 andserver2. You can do it as outlined on the first two pages of this tutorial: http://www.howtoforge.com/Regarding the partitioning, I use the following partition scheme:
/dev/sda1 -- 100 MB /boot (primary, ext3, Bootable flag: on)
/dev/sda5 -- 5000 MB / (logical, ext3)
/dev/sda6 -- 1000 MB swap (logical)
/dev/sda7 -- 150 MB unmounted (logical, ext3)(will contain DRBD's meta data)
/dev/sda8 -- 26 GB unmounted (logical, ext3) (will contain the /data directory)
You can vary the sizes of the partitions depending on your hard disk size, and the names of your partition might also vary, depending on your hardware (e.g. you might have/dev/hda1 instead of /dev/sda1 and so on). However, it is important that /dev/sda7 has a little more than 128 MB because we will use this partition for DRBD's meta data which uses 128 MB. Also, make sure /dev/sda7 as well as /dev/sda8 are identical in size on server1 andserver2, and please do not mount them (when the installer asks you:
No mount point is assigned for the ext3 file system in partition #7 of SCSI1 (0,0,0) (sda).
Do you want to return to the partitioning menu?
please answer No)! /dev/sda8 is going to be our data partition (i.e., our NFS share).
After the basic installation make sure that you giveserver1 and server2 static IP addresses (server1:192.168.0.172, server2: 192.168.0.173), as described at the beginning of http://www.howtoforge.com/
Afterwards, you should check /etc/fstab on both systems. Mine looks like this on both systems:
# /etc/fstab: static file system information. # # proc /proc proc defaults 0 0 /dev/sda5 / ext3 defaults,errors=remount-ro 0 1 /dev/sda1 /boot ext3 defaults 0 2 /dev/sda6 none swap sw 0 0 /dev/hdc /media/cdrom0 iso9660 ro,user,noauto 0 0 /dev/fd0 /media/floppy0 auto rw,user,noauto 0 0 |
# /etc/fstab: static file system information. # # proc /proc proc defaults 0 0 /dev/hda5 / ext3 defaults,errors=remount-ro 0 1 /dev/hda1 /boot ext3 defaults 0 2 /dev/hda6 none swap sw 0 0 /dev/hdc /media/cdrom0 iso9660 ro,user,noauto 0 0 /dev/fd0 /media/floppy0 auto rw,user,noauto 0 0 |
3 Synchronize System Time
It's important that both server1 and server2 have the same system time. Therefore we install an NTP client on both:server1/server2:
apt-get install ntp ntpdate
Afterwards you can check that both have the same time by running
server1/server2:
date
4 Install NFS Server
Next we install the NFS server on both server1 andserver2:server1/server2:
apt-get install nfs-kernel-server
Then we remove the system bootup links for NFS because NFS will be started and controlled by heartbeat in our setup:
server1/server2:
update-rc.d -f nfs-kernel-server remove
update-rc.d -f nfs-common remove
We want to export the directory /data/export (i.e., this will be our NFS share that our web server cluster nodes will use to serve web content), so we edit /etc/exportson server1 and server2. It should contain only the following line:
server1/server2:
/etc/exports:
/data/export/ 192.168.0.0/255.255.255.0(rw, |
man 5 exports
to learn more about this.
Later in this tutorial we will create /data/exports on our empty (and still unmounted!) partition /dev/sda8.
5 Install DRBD
Next we install DRBD on both server1 and server2:server1/server2:
apt-get install kernel-headers-2.6.8-2-386drbd0.7-module-source drbd0.7-utils
cd /usr/src/
tar xvfz drbd0.7.tar.gz
cd modules/drbd/drbd
make
make install
Then edit /etc/drbd.conf on server1 and server2. It must be identical on both systems and looks like this:
server1/server2:
/etc/drbd.conf:
resource r0 { protocol C; incon-degr-cmd "halt -f"; startup { degr-wfc-timeout 120; # 2 minutes. } disk { on-io-error detach; } net { } syncer { rate 10M; group 1; al-extents 257; } on server1 { # ** EDIT ** the hostname of server 1 (uname -n) device /dev/drbd0; # disk /dev/sda8; # ** EDIT ** data partition on server 1 address 192.168.0.172:7788; # ** EDIT ** IP address on server 1 meta-disk /dev/sda7[0]; # ** EDIT ** 128MB partition for DRBD on server 1 } on server2 { # ** EDIT ** the hostname of server 2 (uname -n) device /dev/drbd0; # disk /dev/sda8; # ** EDIT ** data partition on server 2 address 192.168.0.173:7788; # ** EDIT ** IP address on server 2 meta-disk /dev/sda7[0]; # ** EDIT ** 128MB partition for DRBD on server 2 } } |
uname -n
If you have set server1 and server2 respectively as hostnames during the basic Debian installation, then the output of uname -n should be server1 and server2.
Also make sure you replace the IP addresses and the disks appropriately. If you use /dev/hda instead of/dev/sda, please put /dev/hda8 instead of /dev/sda8into /etc/drbd.conf (the same goes for the meta-diskwhere DRBD stores its meta data). /dev/sda8 (or/dev/hda8...) will be used as our NFS share later on.
6 Configure DRBD
Now we load the DRBD kernel module on both server1and server2. We need to do this only now because afterwards it will be loaded by the DRBD init script.server1/server2:
modprobe drbd
Let's configure DRBD:
server1/server2:
drbdadm up all
cat /proc/drbd
The last command should show something like this (on both server1 and server2):
version: 0.7.10 (api:77/proto:74) SVN Revision: 1743 build by phil@mescal, 2005-01-31 12:22:07 0: cs:Connected st:Secondary/Secondary ld:Inconsistent ns:0 nr:0 dw:0 dr:0 al:0 bm:1548 lo:0 pe:0 ua:0 ap:0 1: cs:Unconfigured |
I want to make server1 the primary NFS server andserver2 the "hot-standby", If server1 fails, server2takes over, and if server1 comes back then all data that has changed in the meantime is mirrored back fromserver2 to server1 so that data is always consistent.
This next step has to be done only on server1!
server1:
drbdadm -- --do-what-I-say primary all
Now we start the initial sync between server1 andserver2 so that the data on both servers becomes consistent. On server1, we do this:
server1:
drbdadm -- connect all
The initial sync is going to take a few hours (depending on the size of /dev/sda8 (/dev/hda8...)) so please be patient.
You can see the progress of the initial sync like this onserver1 or server2:
server1/server2:
cat /proc/drbd
The output should look like this:
version: 0.7.10 (api:77/proto:74) SVN Revision: 1743 build by phil@mescal, 2005-01-31 12:22:07 0: cs:SyncSource st:Primary/Secondary ld:Consistent ns:13441632 nr:0 dw:0 dr:13467108 al:0 bm:2369 lo:0 pe:23 ua:226 ap:0 [==========>.........] sync'ed: 53.1% (11606/24733)M finish: 1:14:16 speed: 2,644 (2,204) K/sec 1: cs:Unconfigured |
SVN Revision: 1743 build by phil@mescal, 2005-01-31 12:22:07 0: cs:Connected st:Primary/Secondary ld:Consistent ns:37139 nr:0 dw:0 dr:49035 al:0 bm:6 lo:0 pe:0 ua:0 ap:0 1: cs:Unconfigured |
7 Some Further NFS Configuration
NFS stores some important information (e.g. information about file locks, etc.) in /var/lib/nfs. Now what happens if server1 goes down? server2 takes over, but its information in /var/lib/nfs will be different from the information in server1's /var/lib/nfs directory. Therefore we do some tweaking so that these details will be stored on our /data partition (/dev/sda8 or/dev/hda8...) which is mirrored by DRBD betweenserver1 and server2. So if server1 goes down server2can use the NFS details of server1.server1/server2:
mkdir /data
server1:
mount -t ext3 /dev/drbd0 /data
mv /var/lib/nfs/ /data/
ln -s /data/nfs/ /var/lib/nfs
mkdir /data/export
umount /data
server2:
rm -fr /var/lib/nfs/
ln -s /data/nfs/ /var/lib/nfs
8 Install And Configure heartbeat
heartbeat is the control instance of this whole setup. It is going to be installed on server1 and server2, and it monitors the other server. For example, if server1 goes down, heartbeat on server2 detects this and makesserver2 take over. heartbeat also starts and stops the NFS server on both server1 and server2. It also provides NFS as a virtual service via the IP address192.168.0.174 so that the web server cluster nodes see only one NFS server.First we install heartbeat:
server1/server2:
apt-get install heartbeat
Now we have to create three configuration files forheartbeat. They must be identical on server1 andserver2!
server1/server2:
/etc/heartbeat/ha.cf:
logfacility local0 keepalive 2 #deadtime 30 # USE THIS!!! deadtime 10 bcast eth0 node server1 server2 |
server1/server2:
/etc/heartbeat/haresources:
server1 IPaddr::192.168.0.174/24/eth0 drbddisk::r0 Filesystem::/dev/drbd0::/data: |
server1/server2:
/etc/heartbeat/authkeys:
auth 3 3 md5 somerandomstring |
/etc/heartbeat/authkeys should be readable by root only, therefore we do this:
server1/server2:
chmod 600 /etc/heartbeat/authkeys
Finally we start DRBD and heartbeat on server1 andserver2:
server1/server2:
/etc/init.d/drbd start
/etc/init.d/heartbeat start
9 First Tests
Now we can do our first tests. On server1, runserver1:
ifconfig
In the output, the virtual IP address 192.168.0.174should show up:
eth0 Link encap:Ethernet HWaddr 00:0C:29:A1:C5:9B inet addr:192.168.0.172 Bcast:192.168.0.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fea1:c59b/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:18992 errors:0 dropped:0 overruns:0 frame:0 TX packets:24816 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:2735887 (2.6 MiB) TX bytes:28119087 (26.8 MiB) Interrupt:177 Base address:0x1400 eth0:0 Link encap:Ethernet HWaddr 00:0C:29:A1:C5:9B inet addr:192.168.0.174 Bcast:192.168.0.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interrupt:177 Base address:0x1400 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:71 errors:0 dropped:0 overruns:0 frame:0 TX packets:71 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:5178 (5.0 KiB) TX bytes:5178 (5.0 KiB) |
server1:
df -h
on server1. You should see /data listed there now:
Filesystem Size Used Avail Use% Mounted on /dev/sda5 4.6G 430M 4.0G 10% / tmpfs 126M 0 126M 0% /dev/shm /dev/sda1 89M 11M 74M 13% /boot /dev/drbd0 24G 33M 23G 1% /data |
server2:
ifconfig
df -h
on server2, you shouldn't see 192.168.0.174 and /data.
Now we create a test file in /data/export on server1and then simulate a server failure of server1 (by stopping heartbeat):
server1:
touch /data/export/test1
/etc/init.d/heartbeat stop
If you run ifconfig and df -h on server2 now, you should see the IP address 192.168.0.174 and the /datapartition, and
server2:
ls -l /data/export
should list the file test1 which you created on server1before. So it has been mirrored to server2!
Now we create another test file on server2 and see if it gets mirrored to server1 when it comes up again:
server2:
touch /data/export/test2
server1:
/etc/init.d/heartbeat start
(Wait a few seconds.)
ifconfig
df -h
ls -l /data/export
You should see 192.168.0.174 and /data again onserver1 which means it has taken over again (because we defined it as primary), and you should also see the file/data/export/test2!
10 Configure The NFS Client
Now we install NFS on our client (192.168.0.100):apt-get install nfs-common
Next we create the /data directory and mount our NFS share into it:
mkdir /data
mount 192.168.0.174:/data/export /data
192.168.0.174 is the virtual IP address we configured before. You must make sure that the forward and the reverse DNS record for client.example.com match each other, otherwise you get a "Permission denied" error on the client, and on the server you'll find this in/var/log/syslog:
#Mar 2 04:19:09 localhost rpc.mountd: Fake hostname localhost for 192.168.0.100 - forward lookup doesn't match reverse |
If it works you can now create further test files in /dataon the client and then simulate failures of server1 andserver2 (but not both at a time!) and check if the test files are replicated. On the client you shouldn't notice at all if server1 or server2 fails - the data in the /datadirectory should always be available (unless server1 andserver2 fail at the same time...).
To unmount the /data directory, run
umount /data
If you want to automatically mount the NFS share at boot time, put the following line into /etc/fstab:
192.168.0.174:/data/export /data nfs rw 0 0 |