[openib-general] APM support in openib stack
Venkatesh Babu
venkatesh.babu at 3leafnetworks.com
Fri Oct 20 13:29:08 PDT 2006
I have added couple of patches to the OFED stack as described in
bug#160, bug#172, and bug#159 and with this successfully tested the APM
functionality, except one issue.
Configuration:
2 Nodes
CPU: AMD Opteron(tm) Processor 252 Dual processor
CA type: MT25208
Firmware version: 5.1.4
OS: CentOS release 4.2
IB: OFED 1.0
2 Flextronics 24 port switchs
Node1 Port1 connected to Switch1
Node1 Port2 connected to Switch2
Node2 Port1 connected to switch1
Node 2 Port 2 connected to Switch2
Node1 : Active side of the RC QP
Node 2 : Passive side of the RC QP
Test1:
Failover simulation on Node1
1. Simulate the port1 failure, RC QP migrates the path to port2
2. Simulate the port1 UP to rearm the alternate path from port1
3. Simulate the port2 failure, RC QP migrate the path to port1
4. Simulate the port2 IP to rearm the alternate path from port2
Test2:
Real failover my manually pulling the cable
1. Simulate the failover/failback by pulling cable of Node1 port1
2. Simulate the failover/failback by pulling cable of Node1 port2
3. Simulate the failover/failback by pulling cable of Node2 port1
4. Simulate the failover/failback by pulling cable of Node2 port2
ISSUE:
If I pull the both the cables then there are no paths to the
destination, so RC QP connection is supposed to tear down. But it is
not working.
1. Create a RC QP and load both primary and alternate path
(I was setting rnr_retry_count = 6, retry_count = 6,
packet_life_time field of struct ib_sa_path_rec to 15 and also tried
with 12)
2. Send some traffic over RC QP
3. Disconnect the cable belonging to the primary path
4. It smoothly fails over to alternate path and it becomes primary path.
No affect to the traffic on that RC QP
5. Remove the second cable belonging to the new primary path.
6. Obviously traffic stops since there are no paths to the destination.
But for the outstanding WRs in the RC QP I don't get any callback from
the verbs layer describing whether it succeeded or failed due to some
error like IB_WC_RETRY_EXC_ERR.
When I query the RC QP properties it still shows that it is in
IB_QPS_RTS state.
Without APM functionality it behaves correctly -
1. Create a RC QP and load only primary path
(I was setting rnr_retry_count = 6, retry_count = 6,
packet_life_time field of struct ib_sa_path_rec to 15 and also tried
with 12)
2. Send some traffic over RC QP
3. Disconnect the cable belonging to the primary path
4. Obviously traffic stops since there are no paths to the destination.
For the outstanding WRs in the RC QP I do get a callback from the verbs
layer describing the first WR that it failed due to error
IB_WC_RETRY_EXC_ERR and for all other WRs I get IB_WC_WR_FLUSH_ERR.
I will close this RC QP.
VBabu
Date: Mon, 16 Oct 2006 14:03:50 -0700
From: "Sean Hefty" <mshefty at ichips.intel.com>
Subject: Re: [openib-general] APM support in openib stack
To: somenath at veritas.com
Cc: openib-general at openib.org
Message-ID: <4533F3B6.1030509 at ichips.intel.com>
Content-Type: text/plain; charset=iso-8859-1; format=flowed
somenath wrote:
>>>> Doesn't ib_cm_init_qp_attr() set this for you?
>>
>>
>>
>> No, it doesn't. it returns me
>> attr_mask= 0x12d181
>> port=0x0 alt_port=0x0
>
>
Okay - there was a fix to the cm.c file (svn rev 8267) that added setting the
alternate port number when initializing the QP attributes. Apparently that fix
did not make it into the release that you're using.
- Sean
More information about the general
mailing list