high availability Archives - Justin Silver

Dual Primary DRBD on CentOS 6: GFS2 & Pacemaker

Justin Silver — Fri, 28 Feb 2014 04:04:36 +0000

This guide describes how to create a pair of redundant file servers using DRBD for replication, RedHat GFS2 (Global File System), and Pacemaker for cluster management. In this case we are also using RackSpace Cloud Servers and associated OpenStack features, so we will use the nova client to create the networks, servers, and storage before logging on to finish the configuration.

Once completed you will have dual primary DRBD configuration which allows reads and writes to both nodes at the same time – enabling load balanced NFS for example.

Network Architecture

This is a rough diagram of the network architecture. Each file server is attache to a block storage device, and the file servers will synchronize their disks using DRBD on a private storage network. On a different internal network the NFS servers are fronted by a load balancer (or multiple) which is then attached to your servers. Those servers are then fronted by load balancers, firewalls, etc, or serve content to your internal network. You could also have clients connect directly to the NFS share exposed on the load balancer.

Create RackSpace Cloud Network

First we will need to create a private Cloud Network that we can dedicate to replication. We will call the network “storage-replication” and give it 192.168.1.x. You will need to capture the id that is returned so that you can attach it to a Cloud Server.

nova network-create storage-replication 192.168.1.0/24

+----------+--------------------------------------+
| Property | Value                                |
+----------+--------------------------------------+
| cidr     | 192.168.1.0/24                       |
| id       | 7c99ba74-c28c-4c52-9c5a-xxxxxxxxxxxx |
| label    | storage-replication                  |
+----------+--------------------------------------+

Create RackSpace Cloud Servers

Now that we have a network setup, let’s create two servers and assign them to it. You will need to specify an --image (the one below is for CentOS 6.5 PVHVM) however you can get a list of them using nova image-list. We are going to use a 4GB standard Cloud Server, but again you can use a different --flavor value from the results of nova flavor-list. The --nic net-id is where you place your private network id, and the --file option lets us insert an SSH key so we don’t have to bother with the password after the server is built. We put the script to sleep for 30 seconds to encourage the VM to be provisioned on a different hypervisor to provide additional redundancy. When all is said and done we have two new cloud servers named fileserver-1 and fileserver-2, and once again you should capture the id‘s for later use.

for i in 1 2; \
	do nova boot \
	--image 41e59c5f-530b-423c-86ec-13b23de49288 \
	--flavor 5 \
	--nic net-id=7c99ba74-c28c-4c52-9c5a-xxxxxxxxxxxx \
	--file /root/.ssh/authorized_keys=/Users/justinsilver/.ssh/id_dsa.pub \
	fileserver-${i}; \
	sleep 30; \
done

+------------------------+--------------------------------------+
| Property               | Value                                |
+------------------------+--------------------------------------+
| status                 | BUILD                                |
| updated                | 2014-02-26T08:16:59Z                 |
| OS-EXT-STS:task_state  | scheduling                           |
| key_name               | None                                 |
| image                  | CentOS 6.5 (PVHVM)                   |
| hostId                 |                                      |
| OS-EXT-STS:vm_state    | building                             |
| flavor                 | 4GB Standard Instance                |
| id                     | 69f66617-44e2-4cbc-8a34-xxxxxxxxxxxx |
| user_id                | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx     |
| name                   | fileserver-1                         |
| adminPass              | xxxxxxxxxxxx                         |
| tenant_id              | xxxxxx                               |
| created                | 2014-02-26T08:16:58Z                 |
| OS-DCF:diskConfig      | MANUAL                               |
| accessIPv4             |                                      |
| accessIPv6             |                                      |
| progress               | 0                                    |
| OS-EXT-STS:power_state | 0                                    |
| config_drive           |                                      |
| metadata               | {}                                   |
+------------------------+--------------------------------------+
+------------------------+--------------------------------------+
| Property               | Value                                |
+------------------------+--------------------------------------+
| status                 | BUILD                                |
| updated                | 2014-02-26T08:17:32Z                 |
| OS-EXT-STS:task_state  | scheduling                           |
| key_name               | None                                 |
| image                  | CentOS 6.5 (PVHVM)                   |
| hostId                 |                                      |
| OS-EXT-STS:vm_state    | building                             |
| flavor                 | 4GB Standard Instance                |
| id                     | 5f0a5c3e-0dfa-4583-bddc-xxxxxxxxxxxx |
| user_id                | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx     |
| name                   | fileserver-2                         |
| adminPass              | xxxxxxxxxxxx                         |
| tenant_id              | xxxxxx                               |
| created                | 2014-02-26T08:17:31Z                 |
| OS-DCF:diskConfig      | MANUAL                               |
| accessIPv4             |                                      |
| accessIPv6             |                                      |
| progress               | 0                                    |
| OS-EXT-STS:power_state | 0                                    |
| config_drive           |                                      |
| metadata               | {}                                   |
+------------------------+--------------------------------------+

Create Cloud Block Storage

Next we can create a pair of Cloud Block Storage devices to attach to our newly created Cloud Servers. We are going to use SSD instead of SATA for improved read performance and size it to 200GB – you can choose whatever value you need for your purposes – and remember that since we will be using LVM we can resize later though some care should be take up front. Again pay attention to the id‘s that are returned so that we can attach them to the Cloud Servers.

for i in 1 2; \
	do nova volume-create \
	--display-name fileserver-${i} \
	--volume-type SSD 200; \
done;

+---------------------+--------------------------------------+
| Property            | Value                                |
+---------------------+--------------------------------------+
| status              | available                            |
| display_name        | fileserver-1                         |
| attachments         | []                                   |
| availability_zone   | nova                                 |
| bootable            | false                                |
| created_at          | 2014-02-26T07:11:37.000000           |
| display_description | None                                 |
| volume_type         | SSD                                  |
| snapshot_id         | None                                 |
| source_volid        | None                                 |
| size                | 200                                  |
| id                  | db75fdd8-da9f-48df-861a-xxxxxxxxxxxx |
| metadata            | {}                                   |
+---------------------+--------------------------------------+
+---------------------+--------------------------------------+
| Property            | Value                                |
+---------------------+--------------------------------------+
| status              | available                            |
| display_name        | fileserver-2                         |
| attachments         | []                                   |
| availability_zone   | nova                                 |
| bootable            | false                                |
| created_at          | 2014-02-26T07:11:40.000000           |
| display_description | None                                 |
| volume_type         | SSD                                  |
| snapshot_id         | None                                 |
| source_volid        | None                                 |
| size                | 200                                  |
| id                  | 28a2905e-49e0-426c-8b33-xxxxxxxxxxxx |
| metadata            | {}                                   |
+---------------------+--------------------------------------+

Now use the ID’s of the Cloud Servers and Cloud Block Storage to attach the storage device to the appropriate server.

nova volume-attach 69f66617-44e2-4cbc-8a34-xxxxxxxxxxxx db75fdd8-da9f-48df-861a-xxxxxxxxxxxx /dev/xvdb && \
nova volume-attach 5f0a5c3e-0dfa-4583-bddc-xxxxxxxxxxxx 28a2905e-49e0-426c-8b33-xxxxxxxxxxxx /dev/xvdb

+----------+--------------------------------------+
| Property | Value                                |
+----------+--------------------------------------+
| device   | /dev/xvdb                            |
| serverId | 69f66617-44e2-4cbc-8a34-xxxxxxxxxxxx |
| id       | db75fdd8-da9f-48df-861a-xxxxxxxxxxxx |
| volumeId | db75fdd8-da9f-48df-861a-xxxxxxxxxxxx |
+----------+--------------------------------------+
+----------+--------------------------------------+
| Property | Value                                |
+----------+--------------------------------------+
| device   | /dev/xvdb                            |
| serverId | 5f0a5c3e-0dfa-4583-bddc-xxxxxxxxxxxx |
| id       | 28a2905e-49e0-426c-8b33-xxxxxxxxxxxx |
| volumeId | 28a2905e-49e0-426c-8b33-xxxxxxxxxxxx |
+----------+--------------------------------------+

DRBD Cloud Server Configuration

Once the servers are provisioned, connect to them both to continue the configuration. Since this is a dual primary DRBD configuration you will eventually be able to read and write to both, however for the initial sync we will use fileserver-1 as the primary.

As these are brand new servers it’s recommended to fetch all updates, allow traffic on your private network via iptables and reboot in case there were any kernel patches. Make sure to save your iptables changes or they will be lost after rebooting.

DRBD is in the EL repository which is not included by default but it can be fetched via an RPM. Once this RPM has been installed you can use yum to install both the DRBD driver and utils.

rpm -ivh http://elrepo.org/elrepo-release-6-5.el6.elrepo.noarch.rpm
yum -y update
iptables -A INPUT -m iprange --src-range 192.168.1.1-192.168.1.255 -j ACCEPT
service iptables save
yum install -y kmod-drbd84 drbd84-utils
modprobe drbd

Protect yourself from DNS outages, lookups, and the pain of using the wrong interface by assigning a hostname in the hosts file of each server. By entering the hostnames here we can ensure that corosync and cman use the correct interface for cluster management.

192.168.1.1 fileserver-1
192.168.1.2 fileserver-2

Create a partition on both hosts of the same size. You can use the full Cloud Block Storage or a piece of it (hint: 1GB will sync a lot faster if you’re just doing a test) but for the rest of this guide make sure to assign it to /dev/xvdb1.

In this guide we are going to use LVM to manage the volume so that we can shrink/grow it as needed. To do this you will need to press “T” after the partition is created and then choose “8e” for “Linux LVM”. Press “w” to save and exit or just “q” to exit and lose your changes.

fdisk /dev/xvdb

Disk /dev/xvdb: 214.7 GB, 214748364800 bytes
255 heads, 63 sectors/track, 26108 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x1568dcd9

	Device Boot      Start         End      Blocks   Id  System
/dev/xvdb1               1       26108   209712478+  8e  Linux LVM

pvcreate /dev/xvdb1
	Physical volume "/dev/xvdb1" successfully created
vgcreate fileserver /dev/xvdb1
	Volume group "fileserver" successfully created
lvcreate --name r0 --size 50G fileserver
	Logical volume "r0" created

Now we can create a DRBD resource called r0 using a configuration file called /etc/drbd.d/r0.res. Copy this file onto both nodes and make sure it is the same including whitespace. For the “on” directive you need to use the actual hostname of the server that was entered in /etc/hosts.

resource r0 {
        protocol C;
        startup {
        	become-primary-on both;
        }
        disk {
                fencing resource-and-stonith;
        }
        handlers {
                fence-peer              "/usr/lib/drbd/crm-fence-peer.sh";
                after-resync-target     "/usr/lib/drbd/crm-unfence-peer.sh";
        }
        net {
                cram-hmac-alg sha1;
                shared-secret "DRBD Super Secret Password";
                timeout 180;
                ping-int 3;
                ping-timeout 9;
                allow-two-primaries;
                after-sb-0pri discard-zero-changes;
                after-sb-1pri discard-secondary;
                after-sb-2pri disconnect;
        }
        on fileserver-1 {
                device /dev/drbd1;
                disk /dev/fileserver/r0;
                address 192.168.1.1:7788;
                meta-disk internal;
        }
        on fileserver-2 {
                device /dev/drbd1;
                disk /dev/fileserver/r0;
                address 192.168.1.2:7788;
                meta-disk internal;
        }
}

We are now ready to start the DRBD service, but we want to make sure it doesn’t start on boot as it will be managed by pacemaker.

chkconfig drbd off
service drbd start
drbdadm create-md r0

Writing meta data...
initializing activity log
NOT initializing bitmap
New drbd meta data block successfully created.
success

From fileserver-1 only run the following command to force synchronization for its disk. Check the status and wait until it is 100% complete before continuing.

drbdadm primary --force r0
service drbd status

drbd driver loaded OK; device status:
version: 8.4.4 (api:1/proto:86-101)
GIT-hash: 599f286440bd633d15d5ff985204aff4bccffadd build by phil@Build64R6, 2013-10-14 15:33:06
m:res          cs          ro               ds                     p  mounted  fstype
...            sync'ed:    18.3%            (33656/41188)M
1:r0           SyncSource  Primary/Secondary UpToDate/Inconsistent  C

Now we are ready to promote both DRBD nodes to primary status. You can check the status again and once the sync has completed you are ready to move on.

drbdadm adjust r0
drbdadm primary r0
service drbd status

drbd driver loaded OK; device status:
version: 8.4.4 (api:1/proto:86-101)
GIT-hash: 599f286440bd633d15d5ff985204aff4bccffadd build by phil@Build64R6, 2013-10-14 15:33:06
m:res          cs         ro               ds                 p  mounted  fstype
1:r0           Connected  Primary/Primary  UpToDate/UpToDate  C

Create Clustered Filesystem

Now that DRBD is configured we need to install a clustered filesystem – GFS2 in this case. It handles the details of your filesystem being written and read on multiple nodes as the same time without getting trashed. Using EXT3/4 for example just won’t work properly.

yum -y install gfs2-utils cman pacemaker pacemaker-cli fence-agents resource-agents openais

We are now going to create a /etc/cluster/cluster.conf configuration for a cluster named “pacemaker” (note the max length is 15 characters for corosync.conf). Because DRBD only supports two nodes we can’t reach a quorum in the traditional sense – we must set the special two_node="1" parameter that lets the cluster reach quorum even if a node has failed. Specify that we want to use pacemaker for fencing and we should be good to go. Note the fileserver-1 and fileserver-2 from /etc/hosts.

This is a fairly standard /etc/corosync/corosync.conf configuration however I had to remove the interface > mcastaddress parameter and replace it with broadcast: yes to get the nodes to see each other. We want to bind to 192.168.1.0.

totem {
    version: 2
    token: 3000
    token_retransmits_before_loss_const: 10
    join: 60
    consensus: 3600
    vsftype: none
    max_messages: 20
    clear_node_high_bit: yes
    secauth: off
    threads: 0
    rrp_mode: none
    interface {
        ringnumber: 0
        bindnetaddr: 192.168.1.0
        broadcast: yes
        mcastport: 5405
    }
}
amf {
    mode: disabled
}
service {
    ver:       1
    name:      pacemaker
}
aisexec {
    user:   root
    group:  root
}
logging {
    fileline: off
    to_stderr: yes
    to_logfile: no
    to_syslog: yes
    syslog_facility: daemon
    debug: off
    timestamp: on
    logger_subsys {
        subsys: AMF
        debug: off
        tags: enter|leave|trace1|trace2|trace3|trace4|trace6
    }
}

Starting pacemaker on one node at a time will allow you to monitor the status with cman_tool status and cman_tool nodes. Once everything is started up we can create the GFS2 filesystem on the DRBD resource. Make sure to specify -p lock_dlm for locking management and one journal per node with -j 2. Mounting the devices with -o noatime,nodiratime will give us a performance boost as we don’t really care about access times.

chkconfig pacemaker on
service pacemaker start
mkfs.gfs2 -t pacemaker:storage -p lock_dlm -j 2 /dev/drbd1
mkdir -p /mnt/storage
mount -t gfs2 -o noatime,nodiratime /dev/drbd1 /mnt/storage

STONITH: Shoot The Other Node In The Head

STONITH is not a suggestion when dealing with Dual Primaries, it is an absolute requirement.

– LINBIT support

So now that we have everything up and running this is the not so small task of making sure we are able to deliver uncorrupted data. What happens when the synchronization is lost between nodes? How can we ensure that the misbehaving node is taken out of service? This is what fencing and STONITH accomplish – making sure that data isn’t written to the good nodes until the bad node is out of service, ensuring data integrity.

Install CRM Shell

We are going to use the CRM Shell to control Pacemaker, and we can get it from yum if we install the proper repository.

cd /etc/yum.repos.d/
wget http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/network:ha-clustering:Stable.repo
yum -y install crmsh graphviz

crm configure
property stonith-enabled="false"
commit
exit

crm configure
primitive p_drbd_r1 ocf:linbit:drbd \
	params drbd_resource="drbd1" \
	op start interval="0" timeout="240" \
	op stop interval="0" timeout="100" \
	op monitor interval="29s" role="Master" \
	op monitor interval="31s" role="Slave"
ms ms_drbd_r1 p_drbd_r1 \
	meta master-max="2" \
	master-node-max="1" \
	clone-max="2" \
	clone-node-max="1" \
	notify="true"
primitive p_fs_r1 ocf:heartbeat:Filesystem \
	params device="/dev/drbd1" \
	directory="/mnt/storage" \
	fstype="gfs2" \
	op start interval="0" timeout="60" \
	op stop interval="0" timeout="60" \
	op monitor interval="60" timeout="40"
clone cl_fs_r1 p_fs_r1 meta interleave="true"
colocation co_fs_with_drbd inf: cl_fs_r1 ms_drbd_r1:Master
order o_drbd_before_fs inf: ms_drbd_r1:promote cl_fs_r1
primitive stonith_fence_virsh_fileserver1 stonith:fence_virsh \
	params action="reboot" ipaddr="192.168.1.1" \
	login="root" identity_file="/root/.ssh/id_rsa.pub" \
	port="fileserver-1"
primitive stonith_fence_virsh_fileserver2 stonith:fence_virsh \
	params action="reboot" ipaddr="192.168.1.2" \
	login="root" identity_file="/root/.ssh/id_rsa.pub" \
	port="fileserver-2"
location l_stonith_fence_virsh_machine1_noton_fileserver1 \
	stonith_fence_virsh_fileserver1 -inf: fileserver-1
location l_stonith_fence_virsh_machine1_noton_fileserver2 \
	stonith_fence_virsh_fileserver2 -inf: fileserver-2
property stonith-enabled="true"
commit
exit

Validate the Cluster

Check to make sure that everything is good to go. Now is a good time to make sure things are actually failing over properly before moving to production.

[root@fileserver-1]# service drbd status
drbd driver loaded OK; device status:
version: 8.4.4 (api:1/proto:86-101)
GIT-hash: 599f286440bd633d15d5ff985204aff4bccffadd build by phil@Build64R6, 2013-10-14 15:33:06
m:res    cs         ro               ds                 p  mounted           fstype
1:r0     Connected  Primary/Primary  UpToDate/UpToDate  C  /mnt/storage      gfs2

[root@fileserver-1]# cman_tool nodes
Node  Sts   Inc   Joined               Name
   1   M    152   2014-02-27 22:23:29  fileserver-1
   2   M     76   2014-02-27 22:23:29  fileserver-2

[root@fileserver-2]# cman_tool status
Version: 6.2.0
Config Version: 3
Cluster Name: pacemaker
Cluster Id: 62570
Cluster Member: Yes
Cluster Generation: 152
Membership state: Cluster-Member
Nodes: 2
Expected votes: 1
Total votes: 2
Node votes: 1
Quorum: 1
Active subsystems: 8
Flags: 2node
Ports Bound: 0
Node name: fileserver-1
Node ID: 1
Multicast addresses: 239.192.244.95
Node addresses: 192.168.1.1

[root@fileserver-1]# fence_tool ls
fence domain
member count  2
victim count  0
victim now    0
master nodeid 1
wait state    none
members       1 2

NFS Server

yum -y install nfs*
service rpcbind start
chkconfig rpcbind on
service nfs start
chkconfig nfs on

/mnt/storage     192.168.1.0/24(rw,async,no_root_squash,no_all_squash,no_subtree_check)

/mnt/storage – NFS mount point

192.168.1.0/24 – IP range of clients

rw – read / write access

async – faster asynchronous writes

no_root_squash – allow root

no_all_squash – allow user

no_subtree_check – increases performance but lowers security by preventing parent directory permissions to be checked when accessing shares.

mkdir -p /mnt/storage
mount -t nfs -o rsize=32768,wsize=32768,noatime,nodiratime,async fileserver-1:/mnt/storage /mnt/storage/

The post Dual Primary DRBD on CentOS 6: GFS2 & Pacemaker appeared first on Justin Silver.

Install CRMSH on CentOS 6 / 7

Justin Silver — Thu, 27 Feb 2014 04:11:51 +0000

The CRM Shell for Pacemaker cluster management can be installed by fetching the OpenSUSE ha-clustering repository and installing crmsh via yum. Get the OS version (6 or 7) and use to fetch the correct repository.

cd /etc/yum.repos.d/
OSVERSION=$(cat /etc/centos-release | sed -rn 's/.* ([[:digit:]]).*/\1/p')
wget http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-${OSVERSION}/network:ha-clustering:Stable.repo
yum -y install crmsh

Since it’s rather long, the URL for the Yum repo file is http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-${OSVERSION}/network:ha-clustering:Stable.repo.

The post Install CRMSH on CentOS 6 / 7 appeared first on Justin Silver.

DRBD: Redundant NFS Storage on CentOS 6

Justin Silver — Sun, 16 Feb 2014 17:53:51 +0000

A pair of CentOS NFS servers can be a great way to build an inexpensive, reliable, redundant fileserver. Here we are going to use DRBD to replicate the data between NFS nodes and Heartbeat to provide high availability to the cluster. Here we will use a RackSpace Cloud Server with attached Cloud Block Storage.

Make sure that your DNS resolves correctly for each server’s hostname, and to really make sure put an entry in /etc/hosts. We’ll pretend to use fileserver-1 as the primary and fileserver-2 as the backup and share the /dev/xvdb1 device under the DRBD resource name “data”. It will eventually be available to the filesystem as /dev/drbd1.

10.0.0.1 fileserver-1 fileserver-1.example.com
10.0.0.2 fileserver-2 fileserver-2.example.com

Install EL Repository

If you don’t already have the EL repository for yum installed install it using rpm:

rpm -ivh http://elrepo.org/elrepo-release-6-5.el6.elrepo.noarch.rpm

Install & Configure DRBD

Now install and load the DRBD and its Utils using yum.

yum install -y kmod-drbd84 drbd84-utils
modprobe drbd

Next we need to create a new DRBD resource file by editing /etc/drbd.d/data.res. Make sure to use the correct IP address and devices for your server nodes.

resource data {
    startup {
        wfc-timeout 30;
        outdated-wfc-timeout 20;
        degr-wfc-timeout 30;
    }
    net {
        protocol C;
        cram-hmac-alg sha1;
        shared-secret "Secret Password for DRBD";
    }
    disk {
        resync-rate 100M;
    }
    syncer {
        rate 100M;
        verify-alg sha1;
    }
    on fileserver-1 {
        volume 0 {
            device minor 1;
            disk /dev/xvdb1;
            meta-disk internal;
        }
        address 10.0.0.1:7789;
    }
    on fileserver-2 {
        volume 0 {
            device minor 1;
            disk /dev/xvdb1;
            meta-disk internal;
        }
        address 10.0.0.2:7789;
    }
}

Run the following commands on each server to initialize the storage medadata, start the DRBD service, and bring up the “data” resource.

drbdadm create-md data
service drbd start
drbdadm up data

You can monitor the progress by checking /proc/drbd. It should look something like the following, with a status of “Inconsistent/Inconsistent” being expected at this point.

[root@fileserver-1 ~]# cat /proc/drbd
version: 8.4.4 (api:1/proto:86-101)
GIT-hash: 599f286440bd633d15d5ff985204aff4bccffadd build by phil@Build64R6, 2013-10-14 15:33:06

 1: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r-----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:209708764

On the primary only run the following command to initialize the synchronization between the two nodes.

drbdadm primary --force data

Again we can monitor the status by watching /proc/drbd – notice that the status is now “UpToDate/Inconsistent” along with a sync status (at 4.8% in my example).

[root@fileserver-1 ~]# cat /proc/drbd
version: 8.4.4 (api:1/proto:86-101)
GIT-hash: 599f286440bd633d15d5ff985204aff4bccffadd build by phil@Build64R6, 2013-10-14 15:33:06

 1: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r---n-
    ns:9862244 nr:0 dw:0 dr:9863576 al:0 bm:601 lo:8 pe:2 ua:11 ap:0 ep:1 wo:f oos:199846748
	[>....................] sync'ed:  4.8% (195160/204792)M
	finish: 1:57:22 speed: 28,364 (22,160) K/sec

Once the DRBD device has synced between the two nodes you will see an “UpToDate/UpToDate” message and you are ready to proceed.

[root@fileserver-1 ~]# cat /proc/drbd
version: 8.4.4 (api:1/proto:86-101)
GIT-hash: 599f286440bd633d15d5ff985204aff4bccffadd build by phil@Build64R6, 2013-10-14 15:33:06

 1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:209823780 nr:8 dw:3425928 dr:206400390 al:1763 bm:12800 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

Format & Mount

Once the device has synchronized between your nodes you can prepare it on the primary node and then mount it. Note that you can only mount the device on one node at a time in a standard Primary/Secondary configuration using traditional filesystems such as ext3, however it is possible to create a Dual Primary configuration in which the data can be accessible from both nodes at the same time but requires the use of a clustered filesystem such as GFS or OCFS2 (Oracle Cluster File System v2) used here.

OCFS2 isn’t available from the default repositories so we have to install the Oracle Open Source yum repository, import their key, and install ocfs2-tools so we can set up a clustered configuration.

yum -y install yum-utils
cd /etc/yum.repos.d
wget --no-check-certificate https://public-yum.oracle.com/public-yum-ol6.repo
rpm --import http://public-yum.oracle.com/RPM-GPG-KEY-oracle-ol6
yum-config-manager --disable ol6_latest
yum -y install ocfs2-tools kernel-uek
reboot

You will need to edit /boot/grub/grub.conf to default to the correct kernel – it is very important that the installed driver match the kernel version.

mkfs -t ext3 /dev/drbd1
mkdir -p /mnt/data
mount -t ext3 noatime,nodiratime /dev/drbd1 /mnt/data

If you want to test out that the replicated device is in fact replicating, try the following commands to create test file, demote the primary server to the secondary, promote the secondary to the primary, and mount the device on the backup server.

[root@fileserver-1 ~]

cd ~
touch test_file /mnt/data
umount /mnt/data
drbdadm secondary data

[root@fileserver-2 ~]

drbdadm primary data
mount /dev/drbd1 /mnt/data
cat /proc/drbd
ls -la /mnt/data

Reverse the process to change back to your primary server.

Setup NFS

Next we need to share the replicated storage over NFS so that it can be used by other systems. You’ll need these packages on both nodes of your storage cluster as well as any clients that are going to connect to them.

yum -y install nfs-utils nfs-utils-lib
service portmap start

Some guides will tell you to enable the service on boot using chkconfig however since we will be using Heartbeat to manage the cluster, we don’t want to do this.

Edit the /etc/exports file to share your directory with your clients.

/mnt/data 10.0.0.0/24(rw,async,no_root_squash,no_subtree_check)
10.0.0.0/24 – Share with 10.0.0.0-10.0.0.255

rw – Read/Write access.

async – Achieve better performance with the risk of data corruption if the NFS server reboots before the data is committed to permanent memory. The server lies to the client indicating that the write was successful before it actually is.

no_root_squash – Allow root to connect to this share.

no_subtree_check – Increases performance but lowers security by preventing parent directory permissions to be checked when accessing shares.

Next all that is left is to connect to the NFS server from your client.

mkdir -p /mnt/data
showmount -e fileserver-cluster
mount -v -t nfs -o 'vers=3' fileserver-cluster:/mnt/data /mnt/data

Configuring Heartbeat

The last step of this guide should be the configuration of Heartbeat to manage the NFS cluster, however it is omitted as I ended up going a different route and instead used Pacemaker to control DRBD in a Dual Primary configuration. Since you might have come here looking for a HOWTO with Heartbeat as well, the best I can do is provide a link to a Heartbeat Configuration Guide on the DRBD site.

The post DRBD: Redundant NFS Storage on CentOS 6 appeared first on Justin Silver.