[ewg] SRP HA dm_multipath testing and questions

Scott Weitzenkamp (sweitzen) sweitzen at cisco.com
Mon Apr 9 23:43:47 PDT 2007


I've been testing SRP HA and dm_multipath with:
- RHEL4 U3 x86_64, Cisco FC Gateway, and Sun T4 RAID
- RHEL4 U3 x86_64, Cisco FC Gateway, and Sun 3510 RAID
- SLES10 x86_64, Cisco FC Gateway, and 3 JBODs
 
On RHEL4, I edited /etc/multipath.conf, ran "chkconfig multipathd on",
then rebooted.  On SLES 10, I ran "chkconfig boot.multipath on" and
"chkconfig multipathd on", then rebooted.  Ishai, I don't seem to need
91-srp.rules, are you using the boot.multipath and multipathd scripts?
 
On both RHEL4 networks, I get IB port load balancing and failover, on
SLES10 I only see failover. I'm not sure if this is a function of
RHEL4-vs-SLES10, or RAID vs JBOD.
 
Traffic failover is very slow (a few minutes), what do others see?
 
I will be testing DDN IB storage, EMC DMX, and RHEL5 soon.
 
I'm getting an Oops on RHEL4 U3 x86_64 on both test networks:
 
scsi3 (0:0): rejecting I/O to offline device
scsi3 (0:0): rejecting I/O to offline device
scsi3 (0:0): rejecting I/O to offline device
scsi3 (0:<4>NMI Watchdog detected LOCKUP, CPU=1, registers:
CPU 1
Modules linked in: parport_pc lp parport autofs4 i2c_dev i2c_core nfs
lockd nfs_
acl sunrpc rdma_ucm(U) ib_srp(U) ib_sdp(U) rdma_cm(U) iw_cm(U)
ib_addr(U) ib_loc
al_sa(U) ds yenta_socket pcmcia_core dm_mirror dm_round_robin
dm_multipath dm_mo
d button battery ac ohci_hcd hw_random shpchp ib_mthca(U) ib_ipoib(U)
ib_umad(U)
 ib_ucm(U) ib_uverbs(U) ib_cm(U) ib_sa(U) ib_mad(U) ib_core(U) md5 ipv6
tg3 flop
py sg ext3 jbd mptscsih mptsas mptspi mptfc mptscsi mptbase sd_mod
scsi_mod
Pid: 3990, comm: scsi_eh_3 Not tainted 2.6.9-34.ELsmp
RIP: 0010:[<ffffffff802409bf>] <ffffffff802409bf>{serial_in+83}
RSP: 0018:000001007f203c10  EFLAGS: 00000002
RAX: 00000000ffffff00 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 00000000000003fd RSI: 0000000000000005 RDI: ffffffff804b59a0
RBP: ffffffff804b59a0 R08: 000000000000003a R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000002706
R13: ffffffff8045afc5 R14: 0000000000000009 R15: 000000000000002d
FS:  0000002a958a07a0(0000) GS:ffffffff804d7b80(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000036ce02e728 CR3: 00000000cff00000 CR4: 00000000000006e0
Process scsi_eh_3 (pid: 3990, threadinfo 000001007f202000, task
000001007f1957f0
)
Stack: ffffffff80242ab2 0000000d000402dc ffffffff803f88e0
00000000000402dc
       0000000000040309 0000000000000030 000001017bf79830
000000000000c000
       ffffffff8013764c 0000000000040309
Call Trace:<ffffffff80242ab2>{serial8250_console_write+113}
<ffffffff8013764c>{_
_call_console_drivers+68}
       <ffffffff801378b9>{release_console_sem+276}
<ffffffff80137b44>{vprintk+49
8}
       <ffffffff80137bee>{printk+141} <ffffffff8013346f>{__wake_up+54}
       <ffffffff802498bc>{freed_request+105}
<ffffffffa01e24e4>{:dm_multipath:mu
ltipath_end_io+0}
       <ffffffffa0007350>{:scsi_mod:scsi_prep_fn+120}
<ffffffff80247f53>{elv_nex
t_request+68}
       <ffffffffa00076c6>{:scsi_mod:scsi_request_fn+66}
<ffffffff8024a107>{blk_i
nsert_request+160}
       <ffffffffa0006d15>{:scsi_mod:scsi_requeue_command+48}
       <ffffffffa000720f>{:scsi_mod:scsi_io_completion+866}
       <ffffffffa00064c7>{:scsi_mod:scsi_error_handler+2809}
       <ffffffff80110e17>{child_rip+8}
<ffffffffa00059ce>{:scsi_mod:scsi_error_h
andler+0}
       <ffffffff80110e0f>{child_rip+0}
 
Code: 0f b6 c0 c3 0f b6 4f 22 0f b6 47 23 41 89 d0 d3 e6 83 f8 02
Kernel panic - not syncing: nmi watchdog

 
Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ewg/attachments/20070409/661ceecd/attachment.html>


More information about the ewg mailing list