<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><TITLE></TITLE>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.2900.2976" name=GENERATOR></HEAD>
<BODY>
<DIV dir=ltr align=left><SPAN class=837233805-04102006><FONT face=Arial
color=#0000ff size=2>If I fail back and forth between ib0 and ib1 every 30
seconds or so for several hours, while IPoIB traffic is running, IPoIB host gets
an Oops: and IPoIB stops working.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=837233805-04102006><FONT face=Arial
color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=837233805-04102006><FONT face=Arial
color=#0000ff size=2>ib1: dev_queue_xmit failed to requeue packet<BR>ib1:
dev_queue_xmit failed to requeue packet<BR>ib1: dev_queue_xmit failed to requeue
packet<BR>ib0: dev_queue_xmit failed to requeue packet<BR>ib0: dev_queue_xmit
failed to requeue packet<BR>ib0: dev_queue_xmit failed to requeue packet<BR>ib0:
dev_queue_xmit failed to requeue packet<BR>ib0: dev_queue_xmit failed to requeue
packet<BR>ib0: dev_queue_xmit failed to requeue packet<BR>ib1: dev_queue_xmit
failed to requeue packet<BR>ib1: dev_queue_xmit failed to requeue packet<BR>ib1:
dev_queue_xmit failed to requeue packet<BR>ib0: dev_queue_xmit failed to requeue
packet<BR>ib0: dev_queue_xmit failed to requeue packet<BR>ib0: dev_queue_xmit
failed to requeue packet<BR>ib1: dev_queue_xmit failed to requeue packet<BR>ib1:
dev_queue_xmit failed to requeue packet<BR>ib1: dev_queue_xmit failed to requeue
packet<BR>ib1: dev_queue_xmit failed to requeue packet<BR>ib1: dev_queue_xmit
failed to requeue packet<BR>ib1: dev_queue_xmit failed to requeue packet<BR>ib0:
dev_queue_xmit failed to requeue packet<BR>ib0: dev_queue_xmit failed to requeue
packet<BR>ib0: dev_queue_xmit failed to requeue packet<BR>ib1: dev_queue_xmit
failed to requeue packet<BR>ib1: dev_queue_xmit failed to requeue packet<BR>ib1:
dev_queue_xmit failed to requeue packet<BR>ib1: dev_queue_xmit failed to requeue
packet<BR>ib1: dev_queue_xmit failed to requeue packet<BR>ib1: dev_queue_xmit
failed to requeue packet<BR>ib1: dev_queue_xmit failed to requeue packet<BR>ib1:
dev_queue_xmit failed to requeue packet<BR>ib1: dev_queue_xmit failed to requeue
packet<BR>ib1: dev_queue_xmit failed to requeue packet<BR>ib1: dev_queue_xmit
failed to requeue packet<BR>ib1: dev_queue_xmit failed to requeue packet<BR>ib0:
dev_queue_xmit failed to requeue packet<BR>ib0: dev_queue_xmit failed to requeue
packet<BR>ib0: dev_queue_xmit failed to requeue packet<BR>ib1: dev_queue_xmit
failed to requeue packet<BR>ib1: dev_queue_xmit failed to requeue packet<BR>ib1:
dev_queue_xmit failed to requeue packet<BR>ib0: dev_queue_xmit failed to requeue
packet<BR>ib0: dev_queue_xmit failed to requeue packet<BR>ib0: dev_queue_xmit
failed to requeue packet<BR>ib1: dev_queue_xmit failed to requeue packet<BR>ib1:
dev_queue_xmit failed to requeue packet<BR>ib1: dev_queue_xmit failed to requeue
packet<BR>ib1: dev_queue_xmit failed to requeue packet<BR>ib1: dev_queue_xmit
failed to requeue packet<BR>ib1: dev_queue_xmit failed to requeue packet<BR>ib0:
dev_queue_xmit failed to requeue packet<BR>ib0: dev_queue_xmit failed to requeue
packet<BR>ib0: dev_queue_xmit failed to requeue packet<BR>ib1: dev_queue_xmit
failed to requeue packet<BR>ib1: dev_queue_xmit failed to requeue packet<BR>ib1:
dev_queue_xmit failed to requeue packet<BR>ib0: dev_queue_xmit failed to requeue
packet<BR>ib0: dev_queue_xmit failed to requeue packet<BR>ib0: dev_queue_xmit
failed to requeue packet<BR>ib1: dev_queue_xmit failed to requeue packet<BR>ib1:
dev_queue_xmit failed to requeue packet<BR>ib1: dev_queue_xmit failed to requeue
packet<BR>ib1: dev_queue_xmit failed to requeue packet<BR>ib1: dev_queue_xmit
failed to requeue packet<BR>ib1: dev_queue_xmit failed to requeue packet<BR>ib0:
dev_queue_xmit failed to requeue packet<BR>ib0: dev_queue_xmit failed to requeue
packet<BR>ib0: dev_queue_xmit failed to requeue packet<BR>general protection
fault: 0000 [1] SMP<BR>last sysfs file:
/devices/pci0000:00/0000:00:00.0/irq<BR>CPU 7<BR>Modules linked in: af_packet
ib_sdp rdma_ucm rdma_cm ib_addr ib_cm ib_ipoib ib_s<BR>a ib_uverbs ib_umad
ib_mthca ib_mad ib_core nls_utf8 st ipv6 nfs lockd nfs_acl s<BR>unrpc button
battery ac apparmor aamatch_pcre loop usbhid dm_mod hw_random ide_c<BR>d
ehci_hcd uhci_hcd cdrom i8xx_tco ide_floppy usbcore shpchp e1000 pci_hotplug
f<BR>loppy reiserfs edd fan thermal processor siimage sg mptspi mptscsih mptbase
scsi<BR>_transport_spi piix sd_mod scsi_mod ide_disk ide_core<BR>Pid: 23541,
comm: ib_mad1 Tainted: G U 2.6.16.21-0.8-smp #1<BR>RIP:
0010:[<ffffffff802cffea>]
<ffffffff802cffea>{_spin_lock_irqsave+3}<BR>RSP:
0018:ffff810132a4fc20 EFLAGS: 00010086<BR>RAX: 0000000000000286 RBX:
0000000000000000 RCX: ffffffff883324ee<BR>RDX: ffff810128d5e380 RSI:
0000000000000000 RDI: 0000ffff1b6017ff<BR>RBP: 00000000fffffffc R08:
ffffffff803d3260 R09: ffff810140333800<BR>R10: ffff81000107d400 R11:
0000000000000292 R12: ffff810128d5e380<BR>R13: ffff810132a4fc78 R14:
0000ffff1b6017ff R15: 0000000000000003<BR>FS: 0000000000000000(0000)
GS:ffff810142d19740(0000) knlGS:0000000000000000<BR>CS: 0010 DS: 0018 ES:
0018 CR0: 000000008005003b<BR>CR2: 00002b0b5e6ae180 CR3: 0000000128cbc000 CR4:
00000000000006e0<BR>Process ib_mad1 (pid: 23541, threadinfo ffff810132a4e000,
task ffff810142b56100)<BR>Stack: ffffffff8833c5f5 ffff8101302b3000
0000ffff1b6012ff 0000000000000002<BR>
0000000000000296 ffff8101302b3500 ffffffff8027753e
ffff810128d5e3a0<BR> ffff81012bce1680
ffff810128d5e380<BR>Call Trace:
<ffffffff8833c5f5>{:ib_ipoib:path_rec_completion+862}<BR>
<ffffffff8027753e>{dev_queue_xmit+545}
<ffffffff8833c5b2>{:ib_ipoib:path_<BR>rec_completion+795}<BR>
<ffffffff8833252e>{:ib_sa:ib_sa_path_rec_callback+64}<BR>
<ffffffff80138f17>{lock_timer_base+27}
<ffffffff80138f89>{try_to_del_time<BR>r_sync+81}<BR>
<ffffffff883322b3>{:ib_sa:send_handler+72}
<ffffffff8826762f>{:ib_mad:ib_<BR>mad_complete_send_wr+421}<BR>
<ffffffff88267f00>{:ib_mad:ib_mad_completion_handler+947}<BR>
<ffffffff88267b4d>{:ib_mad:ib_mad_completion_handler+0}<BR>
<ffffffff80140177>{run_workqueue+153}
<ffffffff8014081e>{worker_thread+0}<BR>
<ffffffff801437e5>{keventd_create_kthread+0}
<ffffffff80140927>{worker_th<BR>read+265}<BR>
<ffffffff8012787f>{__wake_up_common+62}
<ffffffff8012905a>{default_wake_f<BR>unction+0}<BR>
<ffffffff801437e5>{keventd_create_kthread+0}
<ffffffff80143aca>{kthread+2<BR>36}<BR>
<ffffffff8010b60a>{child_rip+8}
<ffffffff801437e5>{keventd_create_kthread<BR>+0}<BR>
<ffffffff801439de>{kthread+0}
<ffffffff8010b602>{child_rip+0}</FONT></SPAN></DIV>
<DIV> </DIV>
<DIV dir=ltr align=left><SPAN class=837233805-04102006><FONT face=Arial
color=#0000ff size=2>Code: f0 ff 0f 0f 88 29 01 00 00 c3 fa f0 ff 0f 0f 88 2a 01
00 00<BR>RIP <ffffffff802cffea>{_spin_lock_irqsave+3} RSP
<ffff810132a4fc20><BR></FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=837233805-04102006><FONT face=Arial
color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=837233805-04102006><FONT face=Arial
color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV> </DIV>
<DIV align=left><FONT face=Arial size=2>Scott Weitzenkamp</FONT></DIV>
<DIV align=left><FONT face=Arial size=2>SQA and Release Manager</FONT></DIV>
<DIV align=left><FONT face=Arial size=2>Server Virtualization Business
Unit</FONT></DIV>
<DIV align=left><FONT face=Arial size=2>Cisco Systems</FONT></DIV>
<DIV> </DIV><BR>
<BLOCKQUOTE dir=ltr
style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #0000ff 2px solid; MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> openib-general-bounces@openib.org
[mailto:openib-general-bounces@openib.org] <B>On Behalf Of </B>Scott
Weitzenkamp (sweitzen)<BR><B>Sent:</B> Tuesday, October 03, 2006 2:53
PM<BR><B>To:</B> Vladimir Sokolovsky<BR><B>Cc:</B> EWG;
openib-General<BR><B>Subject:</B> Re: [openib-general] [openfabrics-ewg]
Problems with OFED IPoIB HA on SLES10<BR></FONT><BR></DIV>
<DIV></DIV><!-- Converted from text/plain format --><FONT face=Arial
color=#0000ff size=2>Vlad, thaks for the fast response. I have some
followup questions about configuring IPoIB HA, see below.</FONT><BR>
<P>
<FONT size=2>3) I got IPoIB HA
working on SLES 10, but the documentation is a little lacking.
Looks like I have to put the same IP address in ifcfg-ib0 and ifcfg-ib1, is
this
correct?<BR> <BR><BR>
Yes, IP address should be the same. Actually the configuration of the
secondary interface does not
matter.<BR> The High Availability
daemon reads the configuration of the primary interface and migrates it
between the interfaces in case of
failure.<BR> <BR> <BR>If
I don't have an ifcfg-ib1 file, then ipoib_ha.pl won't start.</FONT></P>
<P><FONT face=Arial color=#0000ff size=2>If I don't have an ifcfg-ib1, then
ipoib_ha.pl won't start. I would prefer to not configure ifcfg-ib1 since
I don't plan to use it.</FONT></P>
<P><FONT face="Courier New" color=#0000ff size=2># ipoib_ha.pl --with-arping
--with-multicast -v<BR>Can't open conf /etc/sysconfig/network/ifcfg-ib1: No
such file or directory<BR>Can't open conf /etc/sysconfig/network/ifcfg-ib1: No
such file or directory<BR>Can't open conf /etc/sysconfig/network/ifcfg-ib1: No
such file or directory<BR>Can't open conf /etc/sysconfig/network/ifcfg-ib1: No
such file or directory<BR>Can't open conf /etc/sysconfig/network/ifcfg-ib1: No
such file or directory<BR>...</FONT></P>
<P><FONT face="Courier New" color=#0000ff size=2><FONT face=Arial>If I put
different IP addresses in ifcfg-ib0 and ifcfg-ib1, then the ifcfg-ib1 IP
address is used for both ib0 and ib1!</FONT></FONT></P>
<P><FONT face="Courier New" color=#0000ff size=2>#
<STRONG>pwd</STRONG><BR>/etc/sysconfig/network<BR># <STRONG>cat
ifcfg-ib0</STRONG><BR>DEVICE=ib0<BR>BOOTPROTO=static<BR>IPADDR=192.168.2.46<BR>NETMASK=255.255.255.0<BR>ONBOOT=yes<BR>#
<STRONG>cat
ifcfg-ib1</STRONG><BR>DEVICE=ib1<BR>BOOTPROTO=static<BR>IPADDR=192.168.6.46<BR>NETMASK=255.255.255.0<BR>ONBOOT=yes<BR>#
<STRONG>/etc/init.d/openibd start</STRONG><BR>Loading HCA driver and Access
Layer:
[ OK ]<BR>Setting up InfiniBand network
interfaces:<BR> ib0
device: Mellanox Technologies MT25208 InfiniHost III Ex (Tavor
com<BR>patibility mode) (rev 20)<BR>
ib0 configuration: ib1<BR>Bringing up
interface
ib0:
[ OK ]<BR>
ib1 device: Mellanox Technologies MT25208
InfiniHost III Ex (Tavor com<BR>patibility mode) (rev 20)<BR>Bringing up
interface
ib1:
[ OK ]<BR>Setting up service network . .
.
[ done ]<BR># <STRONG>ifconfig
ib0</STRONG><BR>ib0 Link
encap:UNSPEC HWaddr
00-00-04-04-FE-80-00-00-00-00-00-00-00-00-00<BR>-00<BR>
inet addr:192.168.6.46 Bcast:192.168.6.255
Mask:255.255.255.0<BR>
inet6 addr: fe80::202:c902:21:700d/64
Scope:Link<BR> UP
BROADCAST RUNNING MULTICAST MTU:2044
Metric:1<BR> RX
packets:0 errors:0 dropped:0 overruns:0
frame:0<BR> TX packets:3
errors:0 dropped:0 overruns:0
carrier:0<BR>
collisions:0
txqueuelen:128<BR> RX
bytes:0 (0.0 b) TX bytes:224 (224.0 b)</FONT></P>
<P><FONT color=#0000ff size=2><FONT face="Courier New"># <STRONG>ifconfig
ib1<BR></STRONG>ib1 Link
encap:UNSPEC HWaddr
00-00-04-05-FE-80-00-00-00-00-00-00-00-00-00<BR>-00<BR>
inet addr:192.168.6.46 Bcast:192.168.6.255
Mask:255.255.255.0<BR>
inet6 addr: fe80::202:c902:21:700e/64
Scope:Link<BR> UP
BROADCAST RUNNING MULTICAST MTU:2044
Metric:1<BR> RX
packets:0 errors:0 dropped:0 overruns:0
frame:0<BR> TX packets:4
errors:0 dropped:0 overruns:0
carrier:0<BR>
collisions:0
txqueuelen:128<BR> RX
bytes:0 (0.0 b) TX bytes:304 (304.0 b)</FONT></FONT></P>
<P><FONT color=#0000ff size=2><FONT face=Arial>Notice how both ib0 and ib1
have the IP address from ifcfg-ib1. This contradicts this info from
ipoib_release_notes.txt:</FONT></FONT></P>
<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">
<P><FONT color=#0000ff size=2><FONT face=Arial> b.
The ib1 interface uses the configuration script of
ib0.<BR></FONT></FONT></P></BLOCKQUOTE>
<P dir=ltr><FONT color=#0000ff size=2><FONT
face=Arial>Scott</P></BLOCKQUOTE></FONT></FONT></BODY></HTML>