[ofw] RE: [RFC] Remove path query from IPoIB

Fab Tillier ftillier at windows.microsoft.com
Mon Aug 11 09:42:49 PDT 2008


> >+               /*
> >+                * Handle a race between the mcast callback and a
> >receive/send.  The QP
> >+                * is joined to the MC group before the MC callback is
> >received, so it
> >+                * can receive packets, and NDIS can try to respond.
> >We need to delay
> >+                * a response until the MC callback runs and sets the
> >AV.
> >+                */
>
> Only a comment is being added here, but... why isn't the join delayed
> until after the MC callback is invoked and the AV created?  Or is the
> wrong race being described?

I initially removed the check for the QPN because I couldn't figure out why it was there.  It didn't seem like it should be.  Well, things broke pretty badly when I did this.  So I added a comment so that someone in the future someone doesn't spend a few hours scratching their head.

The join is delayed internally to IBAL.  So IPoIB asks IBAL to join the MC group, IBAL sends the SA join request, and if that succeeds joins the QP to the MC group.  Once the QP is joined to the MC group IBAL invokes the client's (IPoIB) callback.  So there's a window where the QP is joined to the MC group, but the client doesn't know about it yet.  In that window, it's possible for a receive to happen on the MC group, and for NDIS to try to respond to that receive.  Part of the race is due to the fact that the QP join verb has to be executed at PASSIVE_LEVEL, and the CQ notification is at dispatch (as is the send from NDIS).  Yes, yet another place where the blocking nature of the verb interface complicates things.

IPoIB queues send packets until and endpoint is resolved.  With this new change, it can fabricate AVs as needed, but we don't want this to happen for multicast endpoints.

> >@@ -402,38 +328,17 @@ __path_query_cb(
> >         * for which there is no match, something that doesn't work
> >when
> >         * using LIDs only.
> >         */
> >-       flow_lbl = ib_path_rec_flow_lbl( p_path );
> >        av_attr.grh_valid = TRUE;
>
> I don't think we want grh_valid to always be true here, or we always
> send the GRH.

The comment for this was cut off by the diffs.  Here it is in its entirety:

         * We always send the GRH so that we preferably lookup endpoints
         * by GID rather than by LID.  This allows certain WHQL tests
         * such as the 2c_MediaCheck test to succeed since they don't use
         * IP.  This allows endpoints to be created on the fly for requests
         * for which there is no match, something that doesn't work when
         * using LIDs only.

So sending the GRH always is by design.  It lets things work in the absence of ARP requests.

-Fab



More information about the ofw mailing list