[ofa-general] SRP target/LVM HA configuration
Sufficool, Stanley
ssufficool at rov.sbcounty.gov
Wed Mar 12 08:57:39 PDT 2008
I had looked at this configuration as well and decided to use the volume
management at the clients to mirror the data. Windows LDM mirrored
across 2 SRPT servers and Linux md RAID 1 mirrored.
This provides transparent failover and the SRP client/host will rebuild
the slices that went offline.
> -----Original Message-----
> From: general-bounces at lists.openfabrics.org
> [mailto:general-bounces at lists.openfabrics.org] On Behalf Of
> Daniel Pocock
> Sent: Tuesday, March 11, 2008 4:26 PM
> To: general at lists.openfabrics.org
> Subject: [ofa-general] SRP target/LVM HA configuration
>
>
>
>
>
>
> I'm contemplating a HA configuration based on SRP and LVM (or
> maybe EVMS).
>
> There are many good resources based on NFS and drbd (see
> http://www.linux-ha.org/HaNFS) but it would be more flexible to work
> with block level (e.g SRP) rather than file level (NFS). Obviously,
> SRP/RDMA offers a major performance benefit compared with drbd (which
> uses IP).
>
> Basically, I envisage the primary server having access to the
> secondary
> (passive) server's disk using SRP, and putting both the local
> (primary)
> disk and SRP (secondary) disk into RAID1. The RAID1 set
> would contain a
> volume group and multiple volumes - which would, in turn, be
> SRP targets
> (for VMware to use) or possibly NFS shares.
>
> This leads me to a few issues:
>
> - Read operations - would it be better for the primary to
> read from both
> disks, or just it's own disk? Using drbd, the secondary disk is not
> read unless the primary is down. However, given the
> performance of SRP,
> I suspect that reading from both the local and SRP disk would give a
> boost to performance.
>
> - Does it make sense to use md or LVM to combine a local disk
> and an SRP
> disk into RAID1 (or potentially RAID5)? Are there technical
> challenges
> there, given that one target is slightly faster than the other?
>
> - Fail-over - when the secondary detects that the primary is
> down, can
> it dynamically take the place of the failed SRP target? Will the
> end-user initiators (e.g. VMWare, see diagram below) be confused when
> the changeover occurs? Is there the possibility of data
> inconsistency
> if some write operations had been acknowledged by the primary but not
> propagated to the secondary's disk at the moment when the
> failure occurred?
>
> - Recovery - when the old primary comes back online as a
> secondary, it
> will need to resync it's disk - is a partial resync possible,
> or is full
> rebuild mandatory?
>
>
> Diagram:
>
>
> Disk--Primary Server-------------------SRP Initiator (e.g. VMware ESX)
> | +------NFS client
> | .
> SRP .
> (RAID1 of primary's .
> disk and secondary's .
> disk) . (fail-over path to storage
> | . when primary is down)
> Disk--Secondary Server. . . . . .
>
>
>
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
>
More information about the general
mailing list