[ofa-general] ipoib bonding problems in 1.3-beta2 and 1.2.5.4,
Vu Pham
vuhuong at mellanox.com
Wed Dec 5 10:00:35 PST 2007
Hi Moni,
My systems are RHEL 5.1 x86-64, 2 Sinai hcas, fw 1.2.0
I setup bonding as follow:
IPOIBBOND_ENABLE=yes
IPOIB_BONDS=bond0
bond0_IP=11.1.1.1
bond0_SLAVEs=ib0,ib1
in /etc/infiniband/openib.conf in order to start ib-bond automatically
Testing with OFED-1.3-beta2, I got the following crash while system is
booting up
Stack: ffffffff883429d0 fff810428519d30 ................
Call Trace:
[<ffffffff883429d0>] :bonding:bond_get_stats+0x4a/0x131
[< 8020e9cd>] rtnetlink_fill_ifinfo+0x4ba/0x5c4
ee19>] rtmsg_if info+0x44/0x8d
eea2>] rtnetlink_event+0x40/0x44
8006492a>] notifier_call_chain+0x20/0x32
80208b5e>] dev_open+0x68/0x6e
72e8>] dev_change_flags+0x5a/0x119
80239762>] devinet_ioctl+0x235/0x59c
801ffcf6>] sock_ioctl+0x1c1/0x1e5
8003fc3f>] do_ioctl+0x21/0x6b
8002fa45>] vfs_ioctl+0x248/0261
8004a24b>] sys_ioctl+0x59/0x78
8005b14e>] system_call+0x7e/0x83
Code: Bad RIP value
RIP [0000000000000000000000] _stext+0x7ffff000/0x1000
RSP <ffff10428519cc0>
CR2: 000000000000000000000
<0>Kernel panic - not syncing: Fatal exception
I open bug #812 for this issue.
I moved our systems back to ofed-1.2.5.4 and tested ib-bond again. We
tested it with ib0 and ib1 (connected to different switch/fabric) been
on the same subnet (10.2.1.x, 255.255.255.0) and on different subnets
(10.2.1.x and 10.3.1.x, 255.255.255.0). In both cases there is the issue
of loosing communication between the servers if nodes have not been on
the same primary ib interface.
Example:
1. original state: ib0's are the primary on both servers - pinging bond0
between the servers is fine
2. fail ib0 on one of the servers (ib1 become primary on this server) -
pinging bond0 between the servers fails
3. fail ib0 on the second server (ib1 become primary) - pinging bond0
between the servers is fine again
Is this behavior expected?
thanks,
-vu
More information about the general
mailing list