[openib-general] APM: QP migration state change when failover triggered by hw
Venkatesh Babu
venkatesh.babu at 3leafnetworks.com
Mon Jul 31 19:19:42 PDT 2006
I was testing the APM (Automatic Path Migration) functionality. I found
that with OFED 1.0 doesn't support it yet and also it is not planned for
OFED 1.1. It is interesting to know when this feature is going to be added.
I found that with OFED 1.0, there are some bugs and missing components
to support this feature. So I opened #160, #172 and #159 to track these.
To answer your question -
Configuration1: Node1 and Node2 connected directly with two IB cables
without switch
Configuration2: Node1 and Node 2 conneected through two switches for
each port.
Node1, port1 -> switch1 -> Node2, port1
Node1, port2 -> switch2 -> Node2, port2
Node 1:
1. Call ib_cm_listen() to wait for connection requests
2. When a REQ message arrives create a RC QP and establish a connection
3. Setup callback handlers to receive packets.
4. Receive packets and verify it and drop it.
5. Event IB_MIG_MIGRATED received
6. Stopped receiving packets.
Node 2:
1. Create RC QP
2. Send REQ message to Node 1 to establish the connection (Load both
primary and alternate paths)
3. Contineously send some packets
4. Simulate the port failure by unplugging the IB cable
5. Event IB_MIG_MIGRATED received
Actually I fixed this problem in Configuration1 by calling
ib_modify_qp() to change the mig_state from IB_MIG_ARMED to
IB_MIG_MIGRATED when IB_EVENT_PORT_ERR event occurrs. But with
Configuration2, IB_EVENT_PORT_ERR event occurrs on a node1, failover to
the alternate path doesn't work. The traffic stops. Because node1
doesn't now when the IB_EVENT_PORT_ERR event occurred on Node2. This
requires a interface similar to Gen1 interface
tsIbSetOutofServiceNoticeHandler()
VBabu
Jack Morgenstein wrote:
>On Wednesday 12 July 2006 23:20, Venkatesh Babu wrote:
>
>
>> With OFED 1.0, when cable is removed from the port corresponding to the
>>primary path, CI sends an event IB_EVENT_PATH_MIG, but is not changeing
>>the state to "Migrated" and not migrating to the alternate path. So the
>>traffic doesn't resume on the alternate path.
>>
>>
>>
>
>Could you please describe your flow in more detail (including setup phase),
>and, if possible, send us a small test program which illustrates your
>problem?
>
>Thanks!
>- Jack
>
>
More information about the general
mailing list