[ofa-general] ipoib bonding problems in 1.3-beta2 and 1.2.5.4,

Vu Pham vuhuong at mellanox.com
Wed Dec 5 10:15:25 PST 2007


Scott Weitzenkamp (sweitzen) wrote:
> This is https://bugs.openfabrics.org/show_bug.cgi?id=795, it appears to
> be fixed in the 1128 build.
>
>   

Thanks. I'll close bug #812
Could you or Moni answer/explain the behavior of ib-bond in 1.2.5.4?

>  
>
>   
>> -----Original Message-----
>> From: general-bounces at lists.openfabrics.org 
>> [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Vu Pham
>> Sent: Wednesday, December 05, 2007 10:01 AM
>> To: Moni Shoua
>> Cc: OpenFabrics General
>> Subject: [ofa-general] ipoib bonding problems in 1.3-beta2 
>> and 1.2.5.4,
>>
>> Hi Moni,
>>
>> My systems are RHEL 5.1 x86-64, 2 Sinai hcas, fw 1.2.0
>>
>> I setup bonding as follow:
>> IPOIBBOND_ENABLE=yes
>> IPOIB_BONDS=bond0
>> bond0_IP=11.1.1.1
>> bond0_SLAVEs=ib0,ib1
>> in /etc/infiniband/openib.conf in order to start ib-bond automatically
>>
>> Testing with OFED-1.3-beta2, I got the following crash while 
>> system is 
>> booting up
>>
>> Stack: ffffffff883429d0 fff810428519d30 ................
>> Call Trace:
>> [<ffffffff883429d0>] :bonding:bond_get_stats+0x4a/0x131
>> [<        8020e9cd>] rtnetlink_fill_ifinfo+0x4ba/0x5c4
>>               ee19>] rtmsg_if info+0x44/0x8d
>>               eea2>] rtnetlink_event+0x40/0x44
>>           8006492a>] notifier_call_chain+0x20/0x32
>>           80208b5e>] dev_open+0x68/0x6e
>>               72e8>] dev_change_flags+0x5a/0x119
>>           80239762>] devinet_ioctl+0x235/0x59c
>>           801ffcf6>] sock_ioctl+0x1c1/0x1e5
>>           8003fc3f>] do_ioctl+0x21/0x6b
>>           8002fa45>] vfs_ioctl+0x248/0261
>>           8004a24b>] sys_ioctl+0x59/0x78
>>           8005b14e>] system_call+0x7e/0x83
>>
>> Code: Bad RIP value
>> RIP [0000000000000000000000] _stext+0x7ffff000/0x1000
>>  RSP <ffff10428519cc0>
>> CR2: 000000000000000000000
>>  <0>Kernel panic - not syncing: Fatal exception
>>
>> I open bug #812 for this issue.
>>
>> I moved our systems back to ofed-1.2.5.4 and tested ib-bond again. We 
>> tested it with ib0 and ib1 (connected to different 
>> switch/fabric) been 
>> on the same subnet (10.2.1.x, 255.255.255.0) and on different subnets 
>> (10.2.1.x and 10.3.1.x, 255.255.255.0). In both cases there 
>> is the issue 
>> of loosing communication between the servers if nodes have 
>> not been on 
>> the same primary ib interface.
>>
>> Example:
>> 1. original state: ib0's are the primary on both servers - 
>> pinging bond0 
>> between the servers is fine
>> 2. fail ib0 on one of the servers (ib1 become primary on this 
>> server) - 
>> pinging bond0 between the servers fails
>> 3. fail ib0 on the second server (ib1 become primary) - pinging bond0 
>> between the servers is fine again
>>
>> Is this behavior expected?
>>
>> thanks,
>> -vu
>> _______________________________________________
>> general mailing list
>> general at lists.openfabrics.org
>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>
>> To unsubscribe, please visit 
>> http://openib.org/mailman/listinfo/openib-general
>>
>>     




More information about the general mailing list