[ewg][PATCH][0/2] SRP multipath failover within 60 seconds,

Wed Feb 6 05:31:52 PST 2008

Vu Pham wrote:
> The following patches assist SRP/dm-multipath to failover within 60 
> seconds (bugzilla #577) without data corruption, read/write error
>
> 1. srp_disconnect_without_wait.patch - srp send disconnect request  
> without waiting for CM timewait exit event since srp current does not 
> re-use the cm_id and qp/cq of a connection (patch 
> srp_1_recreate_at_reconnect.patch already in kernel_patches/fixes 
> recreate the cmid, qp/cq for a connection at reconnect)
> 2. srp_qp_in_err_timer_reconnect_target.patch - when detecting a 
> post_send/post_receive error, srp set qp_in_error, set a timer to 
> reconnect to target, return SCSI_MLQUEUE_HOST_BUSY to lock the queue, 
> and return DID_NO_CONNECT when target state is DEAD or REMOVED
>
> Here is my multipath.conf
> defaults {
>        udev_dir                /dev
>        polling_interval        5
>        selector                "round-robin 0"
>        path_grouping_policy    multibus
>        getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
>        prio_callout            /bin/true
>        path_checker            readsector0
>        rr_min_io               100
>        rr_weight               priorities
>        failback                immediate
>        no_path_retry           5
>        user_friendly_names     no
> }
> I also set srp_daemon.sh to rescan fabric every 60 seconds (instead of 
> 300 secs as default setting)
>
> I ran data integrity test to /dev/mapper/<devices> and {disable path 
> 1, sleep 90, enable path 1, sleep 60, disable path 2, sleep 90, enable 
> path 2, sleep 60} in the loop
>
> RHEL5, 5.1 work very well (no data corruption, read/write failure report)
> For SLES 10 sp1, it work well as long as I run *multipath* every 60 
> secs. I think that I mis-configured the multipathd somehow (Here is 
> how I set it up: using the same multipath.conf above, chkconfig 
> boot.multipath on and chkconf multipathd on)
>
>   -vu
>
>
>
This fix issue 577 <https://bugs.openfabrics.org/show_bug.cgi?id=577> 
that was found in OFED 1.2
Vlad - please take this

Tziporet