DRBD: Redundant NFS Storage on CentOS 6

A pair of CentOS NFS servers can be a great way to build an inexpensive, reliable, redundant fileserver. Here we are going to use DRBD to replicate the data between NFS nodes and Heartbeat to provide high availability to the cluster. Here we will use a RackSpace Cloud Server with attached Cloud Block Storage.

Make sure that your DNS resolves correctly for each server’s hostname, and to really make sure put an entry in /etc/hosts. We’ll pretend to use fileserver-1 as the primary and fileserver-2 as the backup and share the /dev/xvdb1 device under the DRBD resource name “data”. It will eventually be available to the filesystem as /dev/drbd1.

10.0.0.1 fileserver-1 fileserver-1.example.com
10.0.0.2 fileserver-2 fileserver-2.example.com

Install EL Repository

If you don’t already have the EL repository for yum installed install it using rpm:

rpm -ivh http://elrepo.org/elrepo-release-6-5.el6.elrepo.noarch.rpm

Install & Configure DRBD

Now install and load the DRBD and its Utils using yum.

yum install -y kmod-drbd84 drbd84-utils
modprobe drbd

Next we need to create a new DRBD resource file by editing /etc/drbd.d/data.res. Make sure to use the correct IP address and devices for your server nodes.

resource data {
    startup {
        wfc-timeout 30;
        outdated-wfc-timeout 20;
        degr-wfc-timeout 30;
    }
    net {
        protocol C;
        cram-hmac-alg sha1;
        shared-secret "Secret Password for DRBD";
    }
    disk {
        resync-rate 100M;
    }
    syncer {
        rate 100M;
        verify-alg sha1;
    }
    on fileserver-1 {
        volume 0 {
            device minor 1;
            disk /dev/xvdb1;
            meta-disk internal;
        }
        address 10.0.0.1:7789;
    }
    on fileserver-2 {
        volume 0 {
            device minor 1;
            disk /dev/xvdb1;
            meta-disk internal;
        }
        address 10.0.0.2:7789;
    }
}

Run the following commands on each server to initialize the storage medadata, start the DRBD service, and bring up the “data” resource.

drbdadm create-md data
service drbd start
drbdadm up data

You can monitor the progress by checking /proc/drbd. It should look something like the following, with a status of “Inconsistent/Inconsistent” being expected at this point.

[[email protected] ~]# cat /proc/drbd
version: 8.4.4 (api:1/proto:86-101)
GIT-hash: 599f286440bd633d15d5ff985204aff4bccffadd build by [email protected], 2013-10-14 15:33:06

 1: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r-----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:209708764

On the primary only run the following command to initialize the synchronization between the two nodes.

drbdadm primary --force data

Again we can monitor the status by watching /proc/drbd – notice that the status is now “UpToDate/Inconsistent” along with a sync status (at 4.8% in my example).

[[email protected] ~]# cat /proc/drbd
version: 8.4.4 (api:1/proto:86-101)
GIT-hash: 599f286440bd633d15d5ff985204aff4bccffadd build by [email protected], 2013-10-14 15:33:06

 1: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r---n-
    ns:9862244 nr:0 dw:0 dr:9863576 al:0 bm:601 lo:8 pe:2 ua:11 ap:0 ep:1 wo:f oos:199846748
	[>....................] sync'ed:  4.8% (195160/204792)M
	finish: 1:57:22 speed: 28,364 (22,160) K/sec

Once the DRBD device has synced between the two nodes you will see an “UpToDate/UpToDate” message and you are ready to proceed.

[[email protected] ~]# cat /proc/drbd
version: 8.4.4 (api:1/proto:86-101)
GIT-hash: 599f286440bd633d15d5ff985204aff4bccffadd build by [email protected], 2013-10-14 15:33:06

 1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:209823780 nr:8 dw:3425928 dr:206400390 al:1763 bm:12800 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

Format & Mount

Once the device has synchronized between your nodes you can prepare it on the primary node and then mount it. Note that you can only mount the device on one node at a time in a standard Primary/Secondary configuration using traditional filesystems such as ext3, however it is possible to create a Dual Primary configuration in which the data can be accessible from both nodes at the same time but requires the use of a clustered filesystem such as GFS or OCFS2 (Oracle Cluster File System v2) used here.

OCFS2 isn’t available from the default repositories so we have to install the Oracle Open Source yum repository, import their key, and install ocfs2-tools so we can set up a clustered configuration.

yum -y install yum-utils
cd /etc/yum.repos.d
wget --no-check-certificate https://public-yum.oracle.com/public-yum-ol6.repo
rpm --import http://public-yum.oracle.com/RPM-GPG-KEY-oracle-ol6
yum-config-manager --disable ol6_latest
yum -y install ocfs2-tools kernel-uek
reboot

You will need to edit /boot/grub/grub.conf to default to the correct kernel – it is very important that the installed driver match the kernel version.

mkfs -t ext3 /dev/drbd1
mkdir -p /mnt/data
mount -t ext3 noatime,nodiratime /dev/drbd1 /mnt/data

If you want to test out that the replicated device is in fact replicating, try the following commands to create test file, demote the primary server to the secondary, promote the secondary to the primary, and mount the device on the backup server.

[[email protected] ~]

cd ~
touch test_file /mnt/data
umount /mnt/data
drbdadm secondary data

[[email protected] ~]

drbdadm primary data
mount /dev/drbd1 /mnt/data
cat /proc/drbd
ls -la /mnt/data

Reverse the process to change back to your primary server.

Setup NFS

Next we need to share the replicated storage over NFS so that it can be used by other systems. You’ll need these packages on both nodes of your storage cluster as well as any clients that are going to connect to them.

yum -y install nfs-utils nfs-utils-lib
service portmap start

Some guides will tell you to enable the service on boot using chkconfig however since we will be using Heartbeat to manage the cluster, we don’t want to do this.

Edit the /etc/exports file to share your directory with your clients.

/mnt/data 10.0.0.0/24(rw,async,no_root_squash,no_subtree_check)
  • 10.0.0.0/24 – Share with 10.0.0.0-10.0.0.255
  • rw – Read/Write access.
  • async – Achieve better performance with the risk of data corruption if the NFS server reboots before the data is committed to permanent memory. The server lies to the client indicating that the write was successful before it actually is.
  • no_root_squash – Allow root to connect to this share.
  • no_subtree_check – Increases performance but lowers security by preventing parent directory permissions to be checked when accessing shares.

Next all that is left is to connect to the NFS server from your client.

mkdir -p /mnt/data
showmount -e fileserver-cluster
mount -v -t nfs -o 'vers=3' fileserver-cluster:/mnt/data /mnt/data

Configuring Heartbeat

The last step of this guide should be the configuration of Heartbeat to manage the NFS cluster, however it is omitted as I ended up going a different route and instead used Pacemaker to control DRBD in a Dual Primary configuration. Since you might have come here looking for a HOWTO with Heartbeat as well, the best I can do is provide a link to a Heartbeat Configuration Guide on the DRBD site.

You may also like...

7 Responses

  1. Justin says:

    Just found what looks like a great guide, but: where’re you configuring fileserver-cluster and heartbeat?

    • Justin Silver says:

      Oops, I got busted. My original design called for using Heartbeat to manage the cluster (which is why it was indicated here) but ultimately I decided to use GFS2, Pacemaker, and a hardware load balancer for my NFS cluster (http://justinsilver.com/technology/linux/dual-primary-drbd-centos-6-gfs2-pacemaker/). The downside is that it requires some additional hardware ($$$) and configuration but does allow both nodes to be used at the same time.

      Since I said I was going to use Heartbeat for the ha-cluster here I’ll update this post (as soon as I can) with the details of what that configuration would look like and give you a ping – thanks for reading!

      • Justin says:

        One other thing you might consider looking at is Gluster, which appears to provide a highly-available distributed and fault-tolerant file system with a much less complicated setup. Dual-primary DRBD would concern me a little, especially if it behaved anything like master-master MySQL does. Given storage is more your field than mine, would you consider giving Gluster a go?

        • Justin Silver says:

          Gluster is definitely a good option to check out. I’m usually more of a cutting edge type person but in this case I had some experience with DRBD & Pacemaker so decided to go that route instead of Gluster, though I would like to play around with it some more (and probably will on future projects). It’s definitely a tricky situation with dual primary which is why the fencing/STONITH setup is so important, but so far I haven’t run into any issues with this config, knock on wood.

  2. Michael G. says:

    Quick question:

    Is the /dev/xvdb1 a shared device presented to each node from a SAN/NAS, or is it an LVM device on each node?

    • Justin Silver says:

      A bit of both? They were LVM mapped devices but hosted on external block storage. This particular setup was at RackSpace so each NFS server was a VM that mounted a block storage device which was shared over NFS and synced with DRBD.

  1. July 13, 2015

    […] this how-to to set up a ha nfs server I stumbled into a problem with zfs and the takeover process of […]

Leave a Reply

Your email address will not be published. Required fields are marked *