[openib-general] APM support in openib stack

Venkatesh Babu venkatesh.babu at 3leafnetworks.com
Fri Oct 20 13:29:08 PDT 2006


 I have added couple of patches to the OFED stack as described in 
bug#160, bug#172, and bug#159 and with this successfully tested the APM 
functionality, except one issue.

Configuration:
2 Nodes
CPU: AMD Opteron(tm) Processor 252 Dual processor
CA type: MT25208
Firmware version: 5.1.4
OS: CentOS release 4.2
IB: OFED 1.0

2 Flextronics 24 port switchs

Node1 Port1 connected to Switch1
Node1 Port2 connected to Switch2
Node2 Port1 connected to switch1
Node 2 Port 2 connected to Switch2

Node1 : Active side of the RC QP
Node 2 : Passive side of the RC QP

Test1:
Failover simulation on Node1
1. Simulate the port1 failure, RC QP migrates the path to port2
2. Simulate the port1 UP to rearm the alternate path from port1
3. Simulate the port2 failure, RC QP migrate the path to port1
4. Simulate the port2 IP to rearm the alternate path from port2

Test2:
Real failover my manually pulling the cable
1. Simulate the failover/failback by pulling cable of Node1 port1
2. Simulate the failover/failback by pulling cable of Node1 port2
3. Simulate the failover/failback by pulling cable of Node2 port1
4. Simulate the failover/failback by pulling cable of Node2 port2


ISSUE:
  If I pull the both the cables then there are no paths to the 
destination, so RC  QP connection is supposed to tear down. But it is 
not working.

1. Create a RC QP and load both primary and alternate path
    (I was setting rnr_retry_count = 6, retry_count = 6, 
packet_life_time field of struct ib_sa_path_rec to 15 and also tried 
with 12)
2. Send some traffic over RC QP
3. Disconnect the cable  belonging to the primary path
4. It smoothly fails over to alternate path and it becomes primary path.

No affect to the traffic on that RC QP
5. Remove the second cable belonging to the new primary path.
6. Obviously traffic stops since there are no paths to the destination. 
But for the outstanding WRs in the RC QP I don't get any callback from 
the verbs layer describing whether it succeeded or failed due to some 
error like IB_WC_RETRY_EXC_ERR.
When I query the RC QP properties it still shows that it is in  
IB_QPS_RTS state.


Without APM functionality it behaves correctly -
1. Create a RC QP and load only primary path
    (I was setting rnr_retry_count = 6, retry_count = 6, 
packet_life_time field of struct ib_sa_path_rec to 15 and also tried 
with 12)
2. Send some traffic over RC QP
3. Disconnect the cable  belonging to the primary path
4. Obviously traffic stops since there are no paths to the destination. 
For the outstanding WRs in the RC QP I do get a callback from the verbs 
layer describing the first WR that it failed due to error 
IB_WC_RETRY_EXC_ERR and for all other WRs I get IB_WC_WR_FLUSH_ERR.
I will close this RC QP.

VBabu

Date: Mon, 16 Oct 2006 14:03:50 -0700
From: "Sean Hefty" <mshefty at ichips.intel.com>
Subject: Re: [openib-general] APM support in openib stack
To: somenath at veritas.com
Cc: openib-general at openib.org
Message-ID: <4533F3B6.1030509 at ichips.intel.com>
Content-Type: text/plain; charset=iso-8859-1; format=flowed

somenath wrote:

>>>> Doesn't ib_cm_init_qp_attr() set this for you?
>>    
>>
>> 
>> No, it doesn't. it returns me
>> attr_mask=        0x12d181
>> port=0x0 alt_port=0x0
>  
>

Okay - there was a fix to the cm.c file (svn rev 8267) that added setting the 
alternate port number when initializing the QP attributes.  Apparently that fix 
did not make it into the release that you're using.

- Sean









More information about the general mailing list