[openib-general] [RFC] [PATCH] rdma/ib_cm: fix APM support
Venkatesh Babu
venkatesh.babu at 3leafnetworks.com
Thu Nov 2 11:37:51 PST 2006
Sean Hefty wrote:
>>Are these changes to replace ib_cm_init_rearm_attr() interface ?
>>
>>
>
>Yes - you use ib_cm_init_qp_attr() to get the qp_attr after a loading a new
>alternate path. The new path is loaded using ib_send_cm_lap(). So, after a
>path fails:
>
>
After path fails, I just call ib_qp_modify() on both active and
passive side to switch to the alternate path by changing path_mig_state
to IB_MIG_MIGRATED.
Let me make the steps clear -
1. On Passive node register for remote port UP/DOWN event by
registering with ib_sa_serv_notice_hdlr()
2. On Passive node start the listener by calling ib_cm_listen().
3. On Active node create the RC QP and establish the connection by
calling ib_send_cm_req(). In struct ib_cm_req_param specify both primary
path (say, through Port1) and alternate path (say, through Port2).
NOTE:-Assume Port1 of Active node is connected to Port1 of Passive node;
and Port2 of Active node is connected to Port2 of Passive node.
NOTE:- After this step QP's path_mig_state will be IB_MIG_ARMED.
4. Let us say, Port1 on Active node fails
5. IB_EVENT_PORT_ERR event is generated on Active node; and remote
port error event is generated on Passive node.
6. In those event handler call ib_qp_modify() to set the
path_mig_state to IB_MIG_MIGRATED. This will let the HCA's firmware know
to switch to the alternate path.
7. After a while, Port1 is comes back again.
8. IB_EVENT_PORT_ACTIVE event is generated on Active node; and remote
port active event is generated on Passive node.
9. On the Active node from IB_EVENT_PORT_ACTIVE event handler call
the ib_send_cm_lap() to send the alternate path (through Port1) to the
Passive node.
9.1 Passive node receives the LAP message
9.2 Calls ib_cm_init_rearm_attr() initialize the alternate path info
9.3 Calls ib_qp_modify() to update path_mig_state to IB_MIG_REARM
9.4 Send APR message back to the Active node.
10. Active node receives the APR message
11. Calls ib_cm_init_rearm_attr() initialize the alternate path info
12. Calls ib_qp_modify() to update path_mig_state to IB_MIG_REARM
13. Now when a first packet is passed between the Active and Passive
node the ib_core changes the path_mig_state to the IB_MIG_ARMED.
14. Now it is all set for another failover.
>One side calls ib_send_cm_lap() to propose a new alternate path.
>Second side responds by calling ib_send_cm_apr().
>Both sides call ib_cm_init_qp_attr(), then ib_modify_qp() to load the new path.
>
>This is intended to work if failover has occurred, or if the user detects that
>the alternate path is down and wants to replace it.
>
>There is an additional call, ib_cm_notify() which is used to let the CM know
>that the primary path has failed, and the alternate path should be used when
>sending future CM messages. In case of failover, this needs to be called before
>calling ib_send_cm_lap() to ensure that the LAP message reaches the remote user.
>
>
>
>>The path migration from Primary to Alternate succeeded, then reloaded
>>the alternate path.
>>
>>
>
>How did you reload the alternate path?
>
>
Steps 9 through 12.
>
>
>>failed with the IB_WC_RETRY_EXC_ERR. But I got the event IB_EVENT_PATH_MIG.
>>
>>With the ib_cm_init_rearm_attr() being called, failover/failback worked
>>fine.
>>
>>
>
>Were you calling ib_send_cm_lap() to load a new alternate path,
>
Step 9
>or just assuming
>that the old path would work after failover occurred?
>
>
Before the failover occurring the QP's path_mig_state must be in
IB_MIG_ARMED, otherwise failover doesn't work.
If it is IB_MIG_ARMED, then alternate path is already loaded, and just
calling ib_qp_modify() to update path_mig_state to IB_MIG_MIGRATED, will
toss the primary path and change the alternate path to primary path.
>- Sean
>
>
More information about the general
mailing list