[ofa-general] ipoib bonding problems in 1.3-beta2 and 1.2.5.4,

Scott Weitzenkamp (sweitzen) sweitzen at cisco.com
Wed Dec 5 10:03:43 PST 2007


This is https://bugs.openfabrics.org/show_bug.cgi?id=795, it appears to
be fixed in the 1128 build.

 

> -----Original Message-----
> From: general-bounces at lists.openfabrics.org 
> [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Vu Pham
> Sent: Wednesday, December 05, 2007 10:01 AM
> To: Moni Shoua
> Cc: OpenFabrics General
> Subject: [ofa-general] ipoib bonding problems in 1.3-beta2 
> and 1.2.5.4,
> 
> Hi Moni,
> 
> My systems are RHEL 5.1 x86-64, 2 Sinai hcas, fw 1.2.0
> 
> I setup bonding as follow:
> IPOIBBOND_ENABLE=yes
> IPOIB_BONDS=bond0
> bond0_IP=11.1.1.1
> bond0_SLAVEs=ib0,ib1
> in /etc/infiniband/openib.conf in order to start ib-bond automatically
> 
> Testing with OFED-1.3-beta2, I got the following crash while 
> system is 
> booting up
> 
> Stack: ffffffff883429d0 fff810428519d30 ................
> Call Trace:
> [<ffffffff883429d0>] :bonding:bond_get_stats+0x4a/0x131
> [<        8020e9cd>] rtnetlink_fill_ifinfo+0x4ba/0x5c4
>               ee19>] rtmsg_if info+0x44/0x8d
>               eea2>] rtnetlink_event+0x40/0x44
>           8006492a>] notifier_call_chain+0x20/0x32
>           80208b5e>] dev_open+0x68/0x6e
>               72e8>] dev_change_flags+0x5a/0x119
>           80239762>] devinet_ioctl+0x235/0x59c
>           801ffcf6>] sock_ioctl+0x1c1/0x1e5
>           8003fc3f>] do_ioctl+0x21/0x6b
>           8002fa45>] vfs_ioctl+0x248/0261
>           8004a24b>] sys_ioctl+0x59/0x78
>           8005b14e>] system_call+0x7e/0x83
> 
> Code: Bad RIP value
> RIP [0000000000000000000000] _stext+0x7ffff000/0x1000
>  RSP <ffff10428519cc0>
> CR2: 000000000000000000000
>  <0>Kernel panic - not syncing: Fatal exception
> 
> I open bug #812 for this issue.
> 
> I moved our systems back to ofed-1.2.5.4 and tested ib-bond again. We 
> tested it with ib0 and ib1 (connected to different 
> switch/fabric) been 
> on the same subnet (10.2.1.x, 255.255.255.0) and on different subnets 
> (10.2.1.x and 10.3.1.x, 255.255.255.0). In both cases there 
> is the issue 
> of loosing communication between the servers if nodes have 
> not been on 
> the same primary ib interface.
> 
> Example:
> 1. original state: ib0's are the primary on both servers - 
> pinging bond0 
> between the servers is fine
> 2. fail ib0 on one of the servers (ib1 become primary on this 
> server) - 
> pinging bond0 between the servers fails
> 3. fail ib0 on the second server (ib1 become primary) - pinging bond0 
> between the servers is fine again
> 
> Is this behavior expected?
> 
> thanks,
> -vu
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general
> 



More information about the general mailing list