[openib-general] APM: QP migration state change when failover triggered by hw

Jack Morgenstein jackm at mellanox.co.il
Wed Aug 2 00:43:16 PDT 2006


On Tuesday 01 August 2006 21:55, Sean Hefty wrote:
> > I am testing APM with kernel module which directly interfaces with
> >ib_verbs.ko and ib_cm.ko.
> >Yes, I do receive IB_MIG_MIGRATED event, but the QP's mig_state is not
> >actually changed to MIGRATED. So I had to do this from my module.
>
>
> There is a pending patch that was recently posted (dispatch communication
> establish event) that can be extended to pass path migration events to the
> ib_cm.  The purpose of passing path migration events to the ib_cm would be
> limited to changing the path that future CM messages, and not related to QP
> transitions.
>
> - Sean

This could be a bit complicated.  For example, say there are two possible 
paths.  After migration has occurred the first time, there is no guarantee 
that the original path has become available again.

There is also a race condition here in your proposal -- the new Alt Path data 
must be specified between the MIGRATED event and the 
communication-established event on the migrated path (so that the LAP message 
may be correctly sent to the remote node).

Babu, regarding the migration event that you are seeing, are you sure that it 
is from the migration transition that does not occur?  Possibly, the 
problematic transition is the second one, which occurs after specifying a new 
alternate path and rearming APM?

It seems more likely to me that the first transition does occur, since you 
receive a MIG event on both sides, and since the alt path data is loaded by 
you during the initial bringup of the RC QP pair(either at init->rtr, or at 
rtr->rts). If you are receiving the MIGRATED event, the qp is already in the 
migrated state.

However, after the first migration occurs, you need to do the following:
1.  send a LAP packet to the remote node, containing the new alt path info.
2.  load NEW alt path information (ib_modify_qp, rts->rts), including remote 
LID received in LAP packet.
3.  Rearm path migration (ib_modify_qp, rts->rts)

Are you certain that the above 3 steps have taken place?

Note that 1. and 2. above are a separate phase from 3., since the IB Spec 
allows changing the alternate path while the QP is still armed, not just when 
it has migrated.

- Jack




More information about the general mailing list