[openib-general] APM: QP migration state change when failover triggered by hw
Venkatesh Babu
venkatesh.babu at 3leafnetworks.com
Tue Aug 1 12:09:22 PDT 2006
I am testing APM with kernel module which directly interfaces with
ib_verbs.ko and ib_cm.ko.
Yes, I do receive IB_MIG_MIGRATED event, but the QP's mig_state is not
actually changed to MIGRATED. So I had to do this from my module.
It could be a bug with ib_cm code, which may not be transitioning the QP
state correctly. But the HW may be thinking that it has migrated. I am
not sure how exactly ib_cm should notice this event and should should
transition the QP state. Any thoughts and suggestions are welcome. I can
code it and test it.
I don't have the test program which will specifically test this
functionality. I am afraid if I can share the whole module.
VBabu
Jack Morgenstein wrote:
>On Tuesday 01 August 2006 05:19, Venkatesh Babu wrote:
>
>
>>Configuration2: Node1 and Node 2 conneected through two switches for
>>each port.
>> Node1, port1 -> switch1 -> Node2, port1
>> Node1, port2 -> switch2 -> Node2, port2
>>
>>Node 1:
>>1. Call ib_cm_listen() to wait for connection requests
>>2. When a REQ message arrives create a RC QP and establish a connection
>>3. Setup callback handlers to receive packets.
>>4. Receive packets and verify it and drop it.
>>5. Event IB_MIG_MIGRATED received
>>6. Stopped receiving packets.
>>
>>Node 2:
>>1. Create RC QP
>>2. Send REQ message to Node 1 to establish the connection (Load both
>>primary and alternate paths)
>>3. Contineously send some packets
>>4. Simulate the port failure by unplugging the IB cable
>>5. Event IB_MIG_MIGRATED received
>>
>> But with
>>Configuration2, IB_EVENT_PORT_ERR event occurrs on a node1, failover to
>>the alternate path doesn't work. The traffic stops. Because node1
>>doesn't now when the IB_EVENT_PORT_ERR event occurred on Node2.
>>
>>
>
>We have not seen these problems here. We have regression tests which check
>APM, and they have run without problems. These tests have scripts which
>bring the HCA port down (equivalent to pulling the cable) to check that the
>migration occurs automatically.
>(You should NOT need to do ib_modify_qp for the migration to work in the case
>of a port error).
>
>Note, though, that these tests use the ibv_verbs layer directly. We have not
>checked out APM over the CM. There may be a bug here regarding setting up
>the alternate path properly when creating the connection (although this does
>seem strange, since you indicate that the MIGRATED event is received on both
>sides!).
>
>Please send us your test code so that we may reproduce the problem here.
>
>- Jack
>
>
>
>
>
>
>
More information about the general
mailing list