[ofa-general] SRP target/LVM HA configuration
Daniel Pocock
daniel at pocock.com.au
Tue Mar 11 16:25:44 PDT 2008
I'm contemplating a HA configuration based on SRP and LVM (or maybe EVMS).
There are many good resources based on NFS and drbd (see
http://www.linux-ha.org/HaNFS) but it would be more flexible to work
with block level (e.g SRP) rather than file level (NFS). Obviously,
SRP/RDMA offers a major performance benefit compared with drbd (which
uses IP).
Basically, I envisage the primary server having access to the secondary
(passive) server's disk using SRP, and putting both the local (primary)
disk and SRP (secondary) disk into RAID1. The RAID1 set would contain a
volume group and multiple volumes - which would, in turn, be SRP targets
(for VMware to use) or possibly NFS shares.
This leads me to a few issues:
- Read operations - would it be better for the primary to read from both
disks, or just it's own disk? Using drbd, the secondary disk is not
read unless the primary is down. However, given the performance of SRP,
I suspect that reading from both the local and SRP disk would give a
boost to performance.
- Does it make sense to use md or LVM to combine a local disk and an SRP
disk into RAID1 (or potentially RAID5)? Are there technical challenges
there, given that one target is slightly faster than the other?
- Fail-over - when the secondary detects that the primary is down, can
it dynamically take the place of the failed SRP target? Will the
end-user initiators (e.g. VMWare, see diagram below) be confused when
the changeover occurs? Is there the possibility of data inconsistency
if some write operations had been acknowledged by the primary but not
propagated to the secondary's disk at the moment when the failure occurred?
- Recovery - when the old primary comes back online as a secondary, it
will need to resync it's disk - is a partial resync possible, or is full
rebuild mandatory?
Diagram:
Disk--Primary Server-------------------SRP Initiator (e.g. VMware ESX)
| +------NFS client
| .
SRP .
(RAID1 of primary's .
disk and secondary's .
disk) . (fail-over path to storage
| . when primary is down)
Disk--Secondary Server. . . . . .
More information about the general
mailing list