[openib-general] [RFC] [PATCH] rdma/ib_cm: fix APM support
Sean Hefty
mshefty at ichips.intel.com
Mon Nov 6 16:45:16 PST 2006
Venkatesh Babu wrote:
> Let me make the steps clear -
> 1. On Passive node register for remote port UP/DOWN event by
> registering with ib_sa_serv_notice_hdlr()
> 2. On Passive node start the listener by calling ib_cm_listen().
> 3. On Active node create the RC QP and establish the connection by
> calling ib_send_cm_req(). In struct ib_cm_req_param specify both primary
> path (say, through Port1) and alternate path (say, through Port2).
> NOTE:-Assume Port1 of Active node is connected to Port1 of Passive node;
> and Port2 of Active node is connected to Port2 of Passive node.
> NOTE:- After this step QP's path_mig_state will be IB_MIG_ARMED.
> 4. Let us say, Port1 on Active node fails
> 5. IB_EVENT_PORT_ERR event is generated on Active node; and remote
> port error event is generated on Passive node.
> 6. In those event handler call ib_qp_modify() to set the
> path_mig_state to IB_MIG_MIGRATED. This will let the HCA's firmware know
> to switch to the alternate path.
> 7. After a while, Port1 is comes back again.
> 8. IB_EVENT_PORT_ACTIVE event is generated on Active node; and remote
> port active event is generated on Passive node.
> 9. On the Active node from IB_EVENT_PORT_ACTIVE event handler call
> the ib_send_cm_lap() to send the alternate path (through Port1) to the
> Passive node.
> 9.1 Passive node receives the LAP message
> 9.2 Calls ib_cm_init_rearm_attr() initialize the alternate path info
> 9.3 Calls ib_qp_modify() to update path_mig_state to IB_MIG_REARM
> 9.4 Send APR message back to the Active node.
> 10. Active node receives the APR message
> 11. Calls ib_cm_init_rearm_attr() initialize the alternate path info
> 12. Calls ib_qp_modify() to update path_mig_state to IB_MIG_REARM
> 13. Now when a first packet is passed between the Active and Passive
> node the ib_core changes the path_mig_state to the IB_MIG_ARMED.
> 14. Now it is all set for another failover.
Using my cm patches, I have a test program that does the following:
1. Establish a connection between two nodes, including an alternate path.
2. Break the primary path (path 1).
This generates IB_MIG_MIGRATED events on both nodes. Failover works.
3. Fix path 1.
(Causes port active event on the client.)
4. Client sends a LAP message with path 1 to the server.
ib_send_cm_lap.
5. Server loads the new alternate path.
ib_cm_init_qp_attr and ib_modify_qp.
6. Server responds with an APR message.
ib_send_cm_apr.
7. Client loads a new alternate path.
ib_cm_init_qp_attr and ib_modify_qp.
8. Disconnect path 2 (original alternate).
9. Server sees IB_MIG_MIGRATED event. Client does not.
I'm still debugging the issue as to why the client does not get the second
IB_MIG_MIGRATED event.
- Sean
More information about the general
mailing list