[openib-general] [RFC] [PATCH] rdma/ib_cm: fix APM support

Venkatesh Babu venkatesh.babu at 3leafnetworks.com
Thu Nov 2 13:15:13 PST 2006


I have the changes to the steps 6, 9.2 and 11. In step 9.2 
ib_cm_init_qp_attr() failed with -22 and then RCQP failed with 
IB_WC_RETRY_EXC_ERR.

 VBabu

Sean Hefty wrote:

>>Let me make the steps clear -
>>    
>>
>
>This helps - thanks.
>
>  
>
>> 1. On Passive node register for remote port UP/DOWN event by
>>registering with ib_sa_serv_notice_hdlr()
>>    
>>
>
>FYI - patches for this are being worked separately.
>
>  
>
>> 2. On Passive node start the listener by calling ib_cm_listen().
>> 3. On Active node create the RC QP and establish the connection by
>>calling ib_send_cm_req(). In struct ib_cm_req_param specify both primary
>>path (say, through Port1) and alternate path (say, through Port2).
>>NOTE:-Assume Port1 of Active node is connected to Port1 of Passive node;
>>and Port2 of Active node is connected to Port2 of  Passive node.
>>NOTE:- After this step QP's path_mig_state will be IB_MIG_ARMED.
>> 4. Let us say, Port1 on Active node fails
>> 5. IB_EVENT_PORT_ERR event is generated on  Active node; and remote
>>port error event is generated on Passive node.
>> 6. In those event handler call ib_qp_modify() to set the
>>path_mig_state to IB_MIG_MIGRATED. This will let the HCA's firmware know
>>to switch to the alternate path.
>>    
>>
>
>At least the active side in your scenario should call ib_cm_notify() after this
>step.  Otherwise, the LAP will go out the primary path, which is down.  This
>isn't a big deal in your test case, since you wait for the primary path to
>return (step 7) before calling ib_send_cm_lap().
>
>  
>
>> 7. After a while, Port1 is comes back again.
>> 8. IB_EVENT_PORT_ACTIVE event is generated on Active node; and remote
>>port active event is generated on Passive node.
>> 9. On the Active node from  IB_EVENT_PORT_ACTIVE event handler call
>>the ib_send_cm_lap() to send the alternate path (through Port1) to the
>>Passive node.
>>   9.1 Passive node receives the LAP message
>>    
>>
>
>The proposed patch will record the alternate path when the LAP is sent or
>received.  (Again, these patches are untested, so there can be some bugs here.
>I'm still working on writing a test program to use these interfaces.)
>
>  
>
>>   9.2 Calls ib_cm_init_rearm_attr() initialize the alternate path info
>>    
>>
>
>This should now call ib_cm_init_qp_attr().
>
>  
>
>>   9.3 Calls ib_qp_modify() to update path_mig_state to IB_MIG_REARM
>>   9.4 Send APR message back to the Active node.
>>10. Active node receives the APR message
>>11. Calls ib_cm_init_rearm_attr() initialize the alternate path info
>>    
>>
>
>This should now call ib_cm_init_qp_attr().
>
>  
>
>>12. Calls ib_qp_modify() to update path_mig_state to IB_MIG_REARM
>>13. Now when a first packet is passed between the Active and Passive
>>node the ib_core changes the path_mig_state to the IB_MIG_ARMED.
>> 14. Now it is all set for another failover.
>>    
>>
>
>Using the proposed patches, where did you see a failure?
>
>- Sean
>  
>




More information about the general mailing list