[openib-general] APM support in openib stack
somenath
somenath at veritas.com
Mon Oct 23 15:42:26 PDT 2006
hi Venkatesh:
Two questions:
1. does re-enabling Migration (as defined in vol1 of ib spec in
17.2.8.1.4) work for you?
(I mean after the 1st path failure, you do lap/apr packet transfer)
2. What applications you are testing with?
thanks, som.
Venkatesh Babu wrote:
>
> I have added couple of patches to the OFED stack as described in
> bug#160, bug#172, and bug#159 and with this successfully tested the
> APM functionality, except one issue.
>
> Configuration:
> 2 Nodes
> CPU: AMD Opteron(tm) Processor 252 Dual processor
> CA type: MT25208
> Firmware version: 5.1.4
> OS: CentOS release 4.2
> IB: OFED 1.0
>
> 2 Flextronics 24 port switchs
>
> Node1 Port1 connected to Switch1
> Node1 Port2 connected to Switch2
> Node2 Port1 connected to switch1
> Node 2 Port 2 connected to Switch2
>
> Node1 : Active side of the RC QP
> Node 2 : Passive side of the RC QP
>
> Test1:
> Failover simulation on Node1
> 1. Simulate the port1 failure, RC QP migrates the path to port2
> 2. Simulate the port1 UP to rearm the alternate path from port1
> 3. Simulate the port2 failure, RC QP migrate the path to port1
> 4. Simulate the port2 IP to rearm the alternate path from port2
>
> Test2:
> Real failover my manually pulling the cable
> 1. Simulate the failover/failback by pulling cable of Node1 port1
> 2. Simulate the failover/failback by pulling cable of Node1 port2
> 3. Simulate the failover/failback by pulling cable of Node2 port1
> 4. Simulate the failover/failback by pulling cable of Node2 port2
>
>
> ISSUE:
> If I pull the both the cables then there are no paths to the
> destination, so RC QP connection is supposed to tear down. But it is
> not working.
>
> 1. Create a RC QP and load both primary and alternate path
> (I was setting rnr_retry_count = 6, retry_count = 6,
> packet_life_time field of struct ib_sa_path_rec to 15 and also tried
> with 12)
> 2. Send some traffic over RC QP
> 3. Disconnect the cable belonging to the primary path
> 4. It smoothly fails over to alternate path and it becomes primary path.
>
> No affect to the traffic on that RC QP
> 5. Remove the second cable belonging to the new primary path.
> 6. Obviously traffic stops since there are no paths to the
> destination. But for the outstanding WRs in the RC QP I don't get any
> callback from the verbs layer describing whether it succeeded or
> failed due to some error like IB_WC_RETRY_EXC_ERR.
> When I query the RC QP properties it still shows that it is in
> IB_QPS_RTS state.
>
>
> Without APM functionality it behaves correctly -
> 1. Create a RC QP and load only primary path
> (I was setting rnr_retry_count = 6, retry_count = 6,
> packet_life_time field of struct ib_sa_path_rec to 15 and also tried
> with 12)
> 2. Send some traffic over RC QP
> 3. Disconnect the cable belonging to the primary path
> 4. Obviously traffic stops since there are no paths to the
> destination. For the outstanding WRs in the RC QP I do get a callback
> from the verbs layer describing the first WR that it failed due to
> error IB_WC_RETRY_EXC_ERR and for all other WRs I get IB_WC_WR_FLUSH_ERR.
> I will close this RC QP.
>
> VBabu
>
> Date: Mon, 16 Oct 2006 14:03:50 -0700
> From: "Sean Hefty" <mshefty at ichips.intel.com>
> Subject: Re: [openib-general] APM support in openib stack
> To: somenath at veritas.com
> Cc: openib-general at openib.org
> Message-ID: <4533F3B6.1030509 at ichips.intel.com>
> Content-Type: text/plain; charset=iso-8859-1; format=flowed
>
> somenath wrote:
>
>>>>> Doesn't ib_cm_init_qp_attr() set this for you?
>>>>
>>>
>>>
>>> No, it doesn't. it returns me
>>> attr_mask= 0x12d181
>>> port=0x0 alt_port=0x0
>>
>>
>>
>
> Okay - there was a fix to the cm.c file (svn rev 8267) that added
> setting the alternate port number when initializing the QP
> attributes. Apparently that fix did not make it into the release that
> you're using.
>
> - Sean
>
>
>
>
>
More information about the general
mailing list