[openib-general] APM support in openib stack

Venkatesh Babu venkatesh.babu at 3leafnetworks.com
Tue Oct 24 16:09:05 PDT 2006


1. Yes, I can rearm the alternate path by sending LAP and APR messages.

2. I was sending some network traffic (netperf) while doing these failovers.

 VBabu

somenath wrote:

> hi Venkatesh:
>
> Two questions:
>
> 1. does re-enabling Migration (as defined in vol1 of ib spec in 
> 17.2.8.1.4) work for you?
> (I mean after the 1st path failure, you do lap/apr packet transfer)
>
> 2. What applications you are testing with?
>
> thanks, som.
>
> Venkatesh Babu wrote:
>
>>
>> I have added couple of patches to the OFED stack as described in 
>> bug#160, bug#172, and bug#159 and with this successfully tested the 
>> APM functionality, except one issue.
>>
>> Configuration:
>> 2 Nodes
>> CPU: AMD Opteron(tm) Processor 252 Dual processor
>> CA type: MT25208
>> Firmware version: 5.1.4
>> OS: CentOS release 4.2
>> IB: OFED 1.0
>>
>> 2 Flextronics 24 port switchs
>>
>> Node1 Port1 connected to Switch1
>> Node1 Port2 connected to Switch2
>> Node2 Port1 connected to switch1
>> Node 2 Port 2 connected to Switch2
>>
>> Node1 : Active side of the RC QP
>> Node 2 : Passive side of the RC QP
>>
>> Test1:
>> Failover simulation on Node1
>> 1. Simulate the port1 failure, RC QP migrates the path to port2
>> 2. Simulate the port1 UP to rearm the alternate path from port1
>> 3. Simulate the port2 failure, RC QP migrate the path to port1
>> 4. Simulate the port2 IP to rearm the alternate path from port2
>>
>> Test2:
>> Real failover my manually pulling the cable
>> 1. Simulate the failover/failback by pulling cable of Node1 port1
>> 2. Simulate the failover/failback by pulling cable of Node1 port2
>> 3. Simulate the failover/failback by pulling cable of Node2 port1
>> 4. Simulate the failover/failback by pulling cable of Node2 port2
>>
>>
>> ISSUE:
>>  If I pull the both the cables then there are no paths to the 
>> destination, so RC  QP connection is supposed to tear down. But it is 
>> not working.
>>
>> 1. Create a RC QP and load both primary and alternate path
>>    (I was setting rnr_retry_count = 6, retry_count = 6, 
>> packet_life_time field of struct ib_sa_path_rec to 15 and also tried 
>> with 12)
>> 2. Send some traffic over RC QP
>> 3. Disconnect the cable  belonging to the primary path
>> 4. It smoothly fails over to alternate path and it becomes primary path.
>>
>> No affect to the traffic on that RC QP
>> 5. Remove the second cable belonging to the new primary path.
>> 6. Obviously traffic stops since there are no paths to the 
>> destination. But for the outstanding WRs in the RC QP I don't get any 
>> callback from the verbs layer describing whether it succeeded or 
>> failed due to some error like IB_WC_RETRY_EXC_ERR.
>> When I query the RC QP properties it still shows that it is in  
>> IB_QPS_RTS state.
>>
>>
>> Without APM functionality it behaves correctly -
>> 1. Create a RC QP and load only primary path
>>    (I was setting rnr_retry_count = 6, retry_count = 6, 
>> packet_life_time field of struct ib_sa_path_rec to 15 and also tried 
>> with 12)
>> 2. Send some traffic over RC QP
>> 3. Disconnect the cable  belonging to the primary path
>> 4. Obviously traffic stops since there are no paths to the 
>> destination. For the outstanding WRs in the RC QP I do get a callback 
>> from the verbs layer describing the first WR that it failed due to 
>> error IB_WC_RETRY_EXC_ERR and for all other WRs I get 
>> IB_WC_WR_FLUSH_ERR.
>> I will close this RC QP.
>>
>> VBabu
>>
>> Date: Mon, 16 Oct 2006 14:03:50 -0700
>> From: "Sean Hefty" <mshefty at ichips.intel.com>
>> Subject: Re: [openib-general] APM support in openib stack
>> To: somenath at veritas.com
>> Cc: openib-general at openib.org
>> Message-ID: <4533F3B6.1030509 at ichips.intel.com>
>> Content-Type: text/plain; charset=iso-8859-1; format=flowed
>>
>> somenath wrote:
>>
>>>>>> Doesn't ib_cm_init_qp_attr() set this for you?
>>>>>
>>>>>
>>>>  
>>>> No, it doesn't. it returns me
>>>> attr_mask=        0x12d181
>>>> port=0x0 alt_port=0x0
>>>
>>>
>>>  
>>>
>>
>> Okay - there was a fix to the cm.c file (svn rev 8267) that added 
>> setting the alternate port number when initializing the QP 
>> attributes.  Apparently that fix did not make it into the release 
>> that you're using.
>>
>> - Sean
>>
>>
>>
>>
>>
>




More information about the general mailing list