[openib-general] [RFC] [PATCH] rdma/ib_cm: fix APM support

Sean Hefty sean.hefty at intel.com
Thu Nov 2 11:32:00 PST 2006


> Let me make the steps clear -

This helps - thanks.

>  1. On Passive node register for remote port UP/DOWN event by
>registering with ib_sa_serv_notice_hdlr()

FYI - patches for this are being worked separately.

>  2. On Passive node start the listener by calling ib_cm_listen().
>  3. On Active node create the RC QP and establish the connection by
>calling ib_send_cm_req(). In struct ib_cm_req_param specify both primary
>path (say, through Port1) and alternate path (say, through Port2).
>NOTE:-Assume Port1 of Active node is connected to Port1 of Passive node;
>and Port2 of Active node is connected to Port2 of  Passive node.
>NOTE:- After this step QP's path_mig_state will be IB_MIG_ARMED.
>  4. Let us say, Port1 on Active node fails
>  5. IB_EVENT_PORT_ERR event is generated on  Active node; and remote
>port error event is generated on Passive node.
>  6. In those event handler call ib_qp_modify() to set the
>path_mig_state to IB_MIG_MIGRATED. This will let the HCA's firmware know
>to switch to the alternate path.

At least the active side in your scenario should call ib_cm_notify() after this
step.  Otherwise, the LAP will go out the primary path, which is down.  This
isn't a big deal in your test case, since you wait for the primary path to
return (step 7) before calling ib_send_cm_lap().

>  7. After a while, Port1 is comes back again.
>  8. IB_EVENT_PORT_ACTIVE event is generated on Active node; and remote
>port active event is generated on Passive node.
>  9. On the Active node from  IB_EVENT_PORT_ACTIVE event handler call
>the ib_send_cm_lap() to send the alternate path (through Port1) to the
>Passive node.
>    9.1 Passive node receives the LAP message

The proposed patch will record the alternate path when the LAP is sent or
received.  (Again, these patches are untested, so there can be some bugs here.
I'm still working on writing a test program to use these interfaces.)

>    9.2 Calls ib_cm_init_rearm_attr() initialize the alternate path info

This should now call ib_cm_init_qp_attr().

>    9.3 Calls ib_qp_modify() to update path_mig_state to IB_MIG_REARM
>    9.4 Send APR message back to the Active node.
> 10. Active node receives the APR message
> 11. Calls ib_cm_init_rearm_attr() initialize the alternate path info

This should now call ib_cm_init_qp_attr().

> 12. Calls ib_qp_modify() to update path_mig_state to IB_MIG_REARM
> 13. Now when a first packet is passed between the Active and Passive
>node the ib_core changes the path_mig_state to the IB_MIG_ARMED.
>  14. Now it is all set for another failover.

Using the proposed patches, where did you see a failure?

- Sean




More information about the general mailing list