[openib-general] [RFC] [PATCH] rdma/ib_cm: fix APM support

Venkatesh Babu venkatesh.babu at 3leafnetworks.com
Thu Nov 2 11:37:51 PST 2006


Sean Hefty wrote:

>>Are these changes to replace ib_cm_init_rearm_attr() interface ?
>>    
>>
>
>Yes - you use ib_cm_init_qp_attr() to get the qp_attr after a loading a new
>alternate path.  The new path is loaded using ib_send_cm_lap().  So, after a
>path fails:
>  
>
  After path fails, I just call ib_qp_modify() on both active and 
passive side to switch to the alternate path by changing path_mig_state 
to IB_MIG_MIGRATED.

 Let me make the steps clear -
  1. On Passive node register for remote port UP/DOWN event by 
registering with ib_sa_serv_notice_hdlr()
  2. On Passive node start the listener by calling ib_cm_listen().
  3. On Active node create the RC QP and establish the connection by 
calling ib_send_cm_req(). In struct ib_cm_req_param specify both primary 
path (say, through Port1) and alternate path (say, through Port2).
NOTE:-Assume Port1 of Active node is connected to Port1 of Passive node; 
and Port2 of Active node is connected to Port2 of  Passive node.
NOTE:- After this step QP's path_mig_state will be IB_MIG_ARMED.
  4. Let us say, Port1 on Active node fails
  5. IB_EVENT_PORT_ERR event is generated on  Active node; and remote 
port error event is generated on Passive node.
  6. In those event handler call ib_qp_modify() to set the 
path_mig_state to IB_MIG_MIGRATED. This will let the HCA's firmware know 
to switch to the alternate path.
  7. After a while, Port1 is comes back again.
  8. IB_EVENT_PORT_ACTIVE event is generated on Active node; and remote 
port active event is generated on Passive node.
  9. On the Active node from  IB_EVENT_PORT_ACTIVE event handler call 
the ib_send_cm_lap() to send the alternate path (through Port1) to the 
Passive node.
    9.1 Passive node receives the LAP message
    9.2 Calls ib_cm_init_rearm_attr() initialize the alternate path info
    9.3 Calls ib_qp_modify() to update path_mig_state to IB_MIG_REARM
    9.4 Send APR message back to the Active node.
 10. Active node receives the APR message
 11. Calls ib_cm_init_rearm_attr() initialize the alternate path info
 12. Calls ib_qp_modify() to update path_mig_state to IB_MIG_REARM
 13. Now when a first packet is passed between the Active and Passive 
node the ib_core changes the path_mig_state to the IB_MIG_ARMED.
  14. Now it is all set for another failover.

>One side calls ib_send_cm_lap() to propose a new alternate path.
>Second side responds by calling ib_send_cm_apr().
>Both sides call ib_cm_init_qp_attr(), then ib_modify_qp() to load the new path.
>
>This is intended to work if failover has occurred, or if the user detects that
>the alternate path is down and wants to replace it.
>
>There is an additional call, ib_cm_notify() which is used to let the CM know
>that the primary path has failed, and the alternate path should be used when
>sending future CM messages.  In case of failover, this needs to be called before
>calling ib_send_cm_lap() to ensure that the LAP message reaches the remote user.
>
>  
>
>>The path migration from Primary to Alternate succeeded, then reloaded
>>the alternate path.
>>    
>>
>
>How did you reload the alternate path?
>  
>
  Steps 9 through 12.

>  
>
>>failed with the IB_WC_RETRY_EXC_ERR. But I got the event IB_EVENT_PATH_MIG.
>>
>>With the ib_cm_init_rearm_attr() being called, failover/failback worked
>>fine.
>>    
>>
>
>Were you calling ib_send_cm_lap() to load a new alternate path, 
>
   Step 9

>or just assuming
>that the old path would work after failover occurred?
>  
>
   Before the failover occurring the QP's path_mig_state must be in 
IB_MIG_ARMED, otherwise failover doesn't work.
If it is IB_MIG_ARMED, then alternate path is already loaded, and just 
calling ib_qp_modify() to update path_mig_state to IB_MIG_MIGRATED, will 
toss the primary path and change the alternate path to primary path.

>- Sean
>  
>




More information about the general mailing list