[openib-general] APM support in openib stack
somenath
somenath at veritas.com
Mon Oct 23 15:59:58 PDT 2006
Venkatesh Babu wrote:
> 1. Yes, I can rearm the alternate path by sending LAP and APR messages.
does the qpair go to rearm state just by sending LAP and APR messages?
I mean, you don't have to change the QP state to REARM explicitely?
>
> 2. I was sending some network traffic (netperf) while doing these
> failovers.
>
so, I assume its SDP's APM feature gets tested? is that true?
thanks, som.
> VBabu
>
> somenath wrote:
>
>> hi Venkatesh:
>>
>> Two questions:
>>
>> 1. does re-enabling Migration (as defined in vol1 of ib spec in
>> 17.2.8.1.4) work for you?
>> (I mean after the 1st path failure, you do lap/apr packet transfer)
>>
>> 2. What applications you are testing with?
>>
>> thanks, som.
>>
>> Venkatesh Babu wrote:
>>
>>>
>>> I have added couple of patches to the OFED stack as described in
>>> bug#160, bug#172, and bug#159 and with this successfully tested the
>>> APM functionality, except one issue.
>>>
>>> Configuration:
>>> 2 Nodes
>>> CPU: AMD Opteron(tm) Processor 252 Dual processor
>>> CA type: MT25208
>>> Firmware version: 5.1.4
>>> OS: CentOS release 4.2
>>> IB: OFED 1.0
>>>
>>> 2 Flextronics 24 port switchs
>>>
>>> Node1 Port1 connected to Switch1
>>> Node1 Port2 connected to Switch2
>>> Node2 Port1 connected to switch1
>>> Node 2 Port 2 connected to Switch2
>>>
>>> Node1 : Active side of the RC QP
>>> Node 2 : Passive side of the RC QP
>>>
>>> Test1:
>>> Failover simulation on Node1
>>> 1. Simulate the port1 failure, RC QP migrates the path to port2
>>> 2. Simulate the port1 UP to rearm the alternate path from port1
>>> 3. Simulate the port2 failure, RC QP migrate the path to port1
>>> 4. Simulate the port2 IP to rearm the alternate path from port2
>>>
>>> Test2:
>>> Real failover my manually pulling the cable
>>> 1. Simulate the failover/failback by pulling cable of Node1 port1
>>> 2. Simulate the failover/failback by pulling cable of Node1 port2
>>> 3. Simulate the failover/failback by pulling cable of Node2 port1
>>> 4. Simulate the failover/failback by pulling cable of Node2 port2
>>>
>>>
>>> ISSUE:
>>> If I pull the both the cables then there are no paths to the
>>> destination, so RC QP connection is supposed to tear down. But it
>>> is not working.
>>>
>>> 1. Create a RC QP and load both primary and alternate path
>>> (I was setting rnr_retry_count = 6, retry_count = 6,
>>> packet_life_time field of struct ib_sa_path_rec to 15 and also tried
>>> with 12)
>>> 2. Send some traffic over RC QP
>>> 3. Disconnect the cable belonging to the primary path
>>> 4. It smoothly fails over to alternate path and it becomes primary
>>> path.
>>>
>>> No affect to the traffic on that RC QP
>>> 5. Remove the second cable belonging to the new primary path.
>>> 6. Obviously traffic stops since there are no paths to the
>>> destination. But for the outstanding WRs in the RC QP I don't get
>>> any callback from the verbs layer describing whether it succeeded or
>>> failed due to some error like IB_WC_RETRY_EXC_ERR.
>>> When I query the RC QP properties it still shows that it is in
>>> IB_QPS_RTS state.
>>>
>>>
>>> Without APM functionality it behaves correctly -
>>> 1. Create a RC QP and load only primary path
>>> (I was setting rnr_retry_count = 6, retry_count = 6,
>>> packet_life_time field of struct ib_sa_path_rec to 15 and also tried
>>> with 12)
>>> 2. Send some traffic over RC QP
>>> 3. Disconnect the cable belonging to the primary path
>>> 4. Obviously traffic stops since there are no paths to the
>>> destination. For the outstanding WRs in the RC QP I do get a
>>> callback from the verbs layer describing the first WR that it failed
>>> due to error IB_WC_RETRY_EXC_ERR and for all other WRs I get
>>> IB_WC_WR_FLUSH_ERR.
>>> I will close this RC QP.
>>>
>>> VBabu
>>>
>>> Date: Mon, 16 Oct 2006 14:03:50 -0700
>>> From: "Sean Hefty" <mshefty at ichips.intel.com>
>>> Subject: Re: [openib-general] APM support in openib stack
>>> To: somenath at veritas.com
>>> Cc: openib-general at openib.org
>>> Message-ID: <4533F3B6.1030509 at ichips.intel.com>
>>> Content-Type: text/plain; charset=iso-8859-1; format=flowed
>>>
>>> somenath wrote:
>>>
>>>>>>> Doesn't ib_cm_init_qp_attr() set this for you?
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> No, it doesn't. it returns me
>>>>> attr_mask= 0x12d181
>>>>> port=0x0 alt_port=0x0
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>> Okay - there was a fix to the cm.c file (svn rev 8267) that added
>>> setting the alternate port number when initializing the QP
>>> attributes. Apparently that fix did not make it into the release
>>> that you're using.
>>>
>>> - Sean
>>>
>>>
>>>
>>>
>>>
>>
More information about the general
mailing list