[openib-general] APM: QP migration state change when failover triggered by hw
Jack Morgenstein
jackm at mellanox.co.il
Tue Aug 1 06:10:02 PDT 2006
On Tuesday 01 August 2006 05:19, Venkatesh Babu wrote:
> Configuration2: Node1 and Node 2 conneected through two switches for
> each port.
> Node1, port1 -> switch1 -> Node2, port1
> Node1, port2 -> switch2 -> Node2, port2
>
> Node 1:
> 1. Call ib_cm_listen() to wait for connection requests
> 2. When a REQ message arrives create a RC QP and establish a connection
> 3. Setup callback handlers to receive packets.
> 4. Receive packets and verify it and drop it.
> 5. Event IB_MIG_MIGRATED received
> 6. Stopped receiving packets.
>
> Node 2:
> 1. Create RC QP
> 2. Send REQ message to Node 1 to establish the connection (Load both
> primary and alternate paths)
> 3. Contineously send some packets
> 4. Simulate the port failure by unplugging the IB cable
> 5. Event IB_MIG_MIGRATED received
>
> But with
> Configuration2, IB_EVENT_PORT_ERR event occurrs on a node1, failover to
> the alternate path doesn't work. The traffic stops. Because node1
> doesn't now when the IB_EVENT_PORT_ERR event occurred on Node2.
We have not seen these problems here. We have regression tests which check
APM, and they have run without problems. These tests have scripts which
bring the HCA port down (equivalent to pulling the cable) to check that the
migration occurs automatically.
(You should NOT need to do ib_modify_qp for the migration to work in the case
of a port error).
Note, though, that these tests use the ibv_verbs layer directly. We have not
checked out APM over the CM. There may be a bug here regarding setting up
the alternate path properly when creating the connection (although this does
seem strange, since you indicate that the MIGRATED event is received on both
sides!).
Please send us your test code so that we may reproduce the problem here.
- Jack
More information about the general
mailing list