[ofa-general] ipoib bonding problems in 1.3-beta2 and 1.2.5.4,
Scott Weitzenkamp (sweitzen)
sweitzen at cisco.com
Wed Dec 5 10:03:43 PST 2007
This is https://bugs.openfabrics.org/show_bug.cgi?id=795, it appears to
be fixed in the 1128 build.
> -----Original Message-----
> From: general-bounces at lists.openfabrics.org
> [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Vu Pham
> Sent: Wednesday, December 05, 2007 10:01 AM
> To: Moni Shoua
> Cc: OpenFabrics General
> Subject: [ofa-general] ipoib bonding problems in 1.3-beta2
> and 1.2.5.4,
>
> Hi Moni,
>
> My systems are RHEL 5.1 x86-64, 2 Sinai hcas, fw 1.2.0
>
> I setup bonding as follow:
> IPOIBBOND_ENABLE=yes
> IPOIB_BONDS=bond0
> bond0_IP=11.1.1.1
> bond0_SLAVEs=ib0,ib1
> in /etc/infiniband/openib.conf in order to start ib-bond automatically
>
> Testing with OFED-1.3-beta2, I got the following crash while
> system is
> booting up
>
> Stack: ffffffff883429d0 fff810428519d30 ................
> Call Trace:
> [<ffffffff883429d0>] :bonding:bond_get_stats+0x4a/0x131
> [< 8020e9cd>] rtnetlink_fill_ifinfo+0x4ba/0x5c4
> ee19>] rtmsg_if info+0x44/0x8d
> eea2>] rtnetlink_event+0x40/0x44
> 8006492a>] notifier_call_chain+0x20/0x32
> 80208b5e>] dev_open+0x68/0x6e
> 72e8>] dev_change_flags+0x5a/0x119
> 80239762>] devinet_ioctl+0x235/0x59c
> 801ffcf6>] sock_ioctl+0x1c1/0x1e5
> 8003fc3f>] do_ioctl+0x21/0x6b
> 8002fa45>] vfs_ioctl+0x248/0261
> 8004a24b>] sys_ioctl+0x59/0x78
> 8005b14e>] system_call+0x7e/0x83
>
> Code: Bad RIP value
> RIP [0000000000000000000000] _stext+0x7ffff000/0x1000
> RSP <ffff10428519cc0>
> CR2: 000000000000000000000
> <0>Kernel panic - not syncing: Fatal exception
>
> I open bug #812 for this issue.
>
> I moved our systems back to ofed-1.2.5.4 and tested ib-bond again. We
> tested it with ib0 and ib1 (connected to different
> switch/fabric) been
> on the same subnet (10.2.1.x, 255.255.255.0) and on different subnets
> (10.2.1.x and 10.3.1.x, 255.255.255.0). In both cases there
> is the issue
> of loosing communication between the servers if nodes have
> not been on
> the same primary ib interface.
>
> Example:
> 1. original state: ib0's are the primary on both servers -
> pinging bond0
> between the servers is fine
> 2. fail ib0 on one of the servers (ib1 become primary on this
> server) -
> pinging bond0 between the servers fails
> 3. fail ib0 on the second server (ib1 become primary) - pinging bond0
> between the servers is fine again
>
> Is this behavior expected?
>
> thanks,
> -vu
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
>
More information about the general
mailing list