[openib-general] APM: QP migration state change when failover triggered by hw

Venkatesh Babu venkatesh.babu at 3leafnetworks.com
Tue Aug 1 12:09:22 PDT 2006


 I am testing APM with kernel module which directly interfaces with 
ib_verbs.ko and ib_cm.ko.
Yes, I do receive IB_MIG_MIGRATED event, but the QP's mig_state is not 
actually changed to MIGRATED. So I had to do this from my module.

It could be a bug with ib_cm code, which may not be transitioning the QP 
state correctly. But the HW may be thinking that it has migrated. I am 
not sure how exactly ib_cm  should notice this event and should should 
transition the QP state. Any thoughts and suggestions are welcome. I can 
code it and test it.

I don't have the test program which will specifically test this 
functionality. I am afraid if I can share the whole module.

 VBabu

Jack Morgenstein wrote:

>On Tuesday 01 August 2006 05:19, Venkatesh Babu wrote:
>  
>
>>Configuration2: Node1 and Node 2 conneected through two switches for
>>each port.
>> Node1, port1 -> switch1 -> Node2, port1
>> Node1, port2 -> switch2 -> Node2, port2
>>
>>Node 1:
>>1. Call ib_cm_listen() to wait for connection requests
>>2. When a REQ message arrives create a RC QP and establish a connection
>>3. Setup callback handlers to receive packets.
>>4. Receive packets and verify it and drop it.
>>5. Event IB_MIG_MIGRATED received
>>6. Stopped receiving packets.
>>
>>Node 2:
>>1. Create RC QP
>>2. Send REQ message to Node 1 to establish the connection (Load both
>>primary and alternate paths)
>>3. Contineously send some packets
>>4. Simulate the port failure by unplugging the IB cable
>>5. Event IB_MIG_MIGRATED received
>>
>> But with
>>Configuration2, IB_EVENT_PORT_ERR event occurrs on a node1, failover to
>>the alternate path doesn't work. The traffic stops. Because node1
>>doesn't now when the IB_EVENT_PORT_ERR event occurred on Node2.
>>    
>>
>
>We have not seen these problems here.  We have regression tests which check 
>APM, and they have run without problems.  These tests have scripts which 
>bring the HCA port down (equivalent to pulling the cable) to check that the 
>migration occurs automatically.
>(You should NOT need to do ib_modify_qp for the migration to work in the case 
>of a port error).
>
>Note, though, that these tests use the ibv_verbs layer directly.  We have not 
>checked out APM over the CM.  There may be a bug here regarding setting up 
>the alternate path properly when creating the connection (although this does 
>seem strange, since you indicate that the MIGRATED event is received on both 
>sides!).
>
>Please send us your test code so that we may reproduce the problem here.
>
>- Jack
>
>
>
>
>
>  
>




More information about the general mailing list