[openib-general] [RFC] [PATCH] rdma/ib_cm: fix APM support
Venkatesh Babu
venkatesh.babu at 3leafnetworks.com
Thu Nov 2 13:15:13 PST 2006
I have the changes to the steps 6, 9.2 and 11. In step 9.2
ib_cm_init_qp_attr() failed with -22 and then RCQP failed with
IB_WC_RETRY_EXC_ERR.
VBabu
Sean Hefty wrote:
>>Let me make the steps clear -
>>
>>
>
>This helps - thanks.
>
>
>
>> 1. On Passive node register for remote port UP/DOWN event by
>>registering with ib_sa_serv_notice_hdlr()
>>
>>
>
>FYI - patches for this are being worked separately.
>
>
>
>> 2. On Passive node start the listener by calling ib_cm_listen().
>> 3. On Active node create the RC QP and establish the connection by
>>calling ib_send_cm_req(). In struct ib_cm_req_param specify both primary
>>path (say, through Port1) and alternate path (say, through Port2).
>>NOTE:-Assume Port1 of Active node is connected to Port1 of Passive node;
>>and Port2 of Active node is connected to Port2 of Passive node.
>>NOTE:- After this step QP's path_mig_state will be IB_MIG_ARMED.
>> 4. Let us say, Port1 on Active node fails
>> 5. IB_EVENT_PORT_ERR event is generated on Active node; and remote
>>port error event is generated on Passive node.
>> 6. In those event handler call ib_qp_modify() to set the
>>path_mig_state to IB_MIG_MIGRATED. This will let the HCA's firmware know
>>to switch to the alternate path.
>>
>>
>
>At least the active side in your scenario should call ib_cm_notify() after this
>step. Otherwise, the LAP will go out the primary path, which is down. This
>isn't a big deal in your test case, since you wait for the primary path to
>return (step 7) before calling ib_send_cm_lap().
>
>
>
>> 7. After a while, Port1 is comes back again.
>> 8. IB_EVENT_PORT_ACTIVE event is generated on Active node; and remote
>>port active event is generated on Passive node.
>> 9. On the Active node from IB_EVENT_PORT_ACTIVE event handler call
>>the ib_send_cm_lap() to send the alternate path (through Port1) to the
>>Passive node.
>> 9.1 Passive node receives the LAP message
>>
>>
>
>The proposed patch will record the alternate path when the LAP is sent or
>received. (Again, these patches are untested, so there can be some bugs here.
>I'm still working on writing a test program to use these interfaces.)
>
>
>
>> 9.2 Calls ib_cm_init_rearm_attr() initialize the alternate path info
>>
>>
>
>This should now call ib_cm_init_qp_attr().
>
>
>
>> 9.3 Calls ib_qp_modify() to update path_mig_state to IB_MIG_REARM
>> 9.4 Send APR message back to the Active node.
>>10. Active node receives the APR message
>>11. Calls ib_cm_init_rearm_attr() initialize the alternate path info
>>
>>
>
>This should now call ib_cm_init_qp_attr().
>
>
>
>>12. Calls ib_qp_modify() to update path_mig_state to IB_MIG_REARM
>>13. Now when a first packet is passed between the Active and Passive
>>node the ib_core changes the path_mig_state to the IB_MIG_ARMED.
>> 14. Now it is all set for another failover.
>>
>>
>
>Using the proposed patches, where did you see a failure?
>
>- Sean
>
>
More information about the general
mailing list