[openib-general] [RFC] [PATCH] rdma/ib_cm: fix APM support

Sean Hefty mshefty at ichips.intel.com
Mon Nov 6 16:45:16 PST 2006


Venkatesh Babu wrote:
>  Let me make the steps clear -
>   1. On Passive node register for remote port UP/DOWN event by 
> registering with ib_sa_serv_notice_hdlr()
>   2. On Passive node start the listener by calling ib_cm_listen().
>   3. On Active node create the RC QP and establish the connection by 
> calling ib_send_cm_req(). In struct ib_cm_req_param specify both primary 
> path (say, through Port1) and alternate path (say, through Port2).
> NOTE:-Assume Port1 of Active node is connected to Port1 of Passive node; 
> and Port2 of Active node is connected to Port2 of  Passive node.
> NOTE:- After this step QP's path_mig_state will be IB_MIG_ARMED.
>   4. Let us say, Port1 on Active node fails
>   5. IB_EVENT_PORT_ERR event is generated on  Active node; and remote 
> port error event is generated on Passive node.
>   6. In those event handler call ib_qp_modify() to set the 
> path_mig_state to IB_MIG_MIGRATED. This will let the HCA's firmware know 
> to switch to the alternate path.
>   7. After a while, Port1 is comes back again.
>   8. IB_EVENT_PORT_ACTIVE event is generated on Active node; and remote 
> port active event is generated on Passive node.
>   9. On the Active node from  IB_EVENT_PORT_ACTIVE event handler call 
> the ib_send_cm_lap() to send the alternate path (through Port1) to the 
> Passive node.
>     9.1 Passive node receives the LAP message
>     9.2 Calls ib_cm_init_rearm_attr() initialize the alternate path info
>     9.3 Calls ib_qp_modify() to update path_mig_state to IB_MIG_REARM
>     9.4 Send APR message back to the Active node.
>  10. Active node receives the APR message
>  11. Calls ib_cm_init_rearm_attr() initialize the alternate path info
>  12. Calls ib_qp_modify() to update path_mig_state to IB_MIG_REARM
>  13. Now when a first packet is passed between the Active and Passive 
> node the ib_core changes the path_mig_state to the IB_MIG_ARMED.
>   14. Now it is all set for another failover.

Using my cm patches, I have a test program that does the following:

1. Establish a connection between two nodes, including an alternate path.
2. Break the primary path (path 1).
    This generates IB_MIG_MIGRATED events on both nodes.  Failover works.
3. Fix path 1.
    (Causes port active event on the client.)
4. Client sends a LAP message with path 1 to the server.
    ib_send_cm_lap.
5. Server loads the new alternate path.
    ib_cm_init_qp_attr and ib_modify_qp.
6. Server responds with an APR message.
    ib_send_cm_apr.
7. Client loads a new alternate path.
    ib_cm_init_qp_attr and ib_modify_qp.
8. Disconnect path 2 (original alternate).
9. Server sees IB_MIG_MIGRATED event.  Client does not.

I'm still debugging the issue as to why the client does not get the second 
IB_MIG_MIGRATED event.

- Sean




More information about the general mailing list