[ofw] RE: Patch: [ipoib] Make sure that the dlid is zero if it isnot in the list.

Fab Tillier ftillier at windows.microsoft.com
Mon Nov 10 15:34:57 PST 2008


>From my memory (which is getting fainter as time goes on), the receive processing will look the source up by GID if a GRH is present, and if not by LID.  Only unicast traffic should ever come in with a LID, and I believe all Windows IPoIB traffic goes out with the GRH set.

So multicast endpoints shouldn't need to be in the LID map, though I can't imagine a reason why that would be problematic (unless multiple groups map to the same LID - though is that even possible?)

-Fab

> -----Original Message-----
> From: ofw-bounces at lists.openfabrics.org [mailto:ofw-
> bounces at lists.openfabrics.org] On Behalf Of Tzachi Dar
> Sent: Friday, November 07, 2008 6:21 AM
> To: Alex Estrin; Fab Tillier; ofw at lists.openfabrics.org
> Subject: RE: [ofw] RE: Patch: [ipoib] Make sure that the dlid is zero
> if it isnot in the list.
>
> Indeed it seems that you have a good point here.
>
> The lid endpoints is only being used on the function
> __endpt_mgr_get_by_lid(). This function is only called from the function
> __recv_get_endpts.
>
> On the place that it is called around line 1860, there is the following
> comment:
>
>                 /*
>                  * Lookup the remote endpoint based on LID.  Note that
>                  only * unicast traffic can be LID routed. */
>  I'll try removing this lids and we will see if multicast continue to
> work.
>
> Thanks
> Tzachi
>
>
>> -----Original Message-----
>> From: Alex Estrin [mailto:alex.estrin at qlogic.com]
>> Sent: Friday, November 07, 2008 2:19 PM
>> To: Tzachi Dar; Fab Tillier; ofw at lists.openfabrics.org
>> Subject: RE: [ofw] RE: Patch: [ipoib] Make sure that the dlid is
>> zero if it isnot in the list.
>>
>> Why multicast endpoint should be inserted on lid list at all?
>>
>> Thanks,
>> Alex.
>>
>>> -----Original Message----- From: Tzachi Dar
>>> [mailto:tzachid at mellanox.co.il] Sent: Thursday, November 06, 2008 7:25
>>> PM To: Alex Estrin; Fab Tillier; ofw at lists.openfabrics.org Subject:
>>> RE: [ofw] RE: Patch: [ipoib] Make sure that the dlid is zero if it
>>> isnot in the list.
>>>
>>> You both have good questions that I don't have answers yet.
>>> I'll try to
>>> think of it more tomorrow.
>>>
>>> Alex, please note that at least in the assert that I have received the
>>> problem has happened because of multicast_cb which means that arp was
>>> not related there. (you might be pointing to another problem).
>>>
>>> In any case, the more I think of it, if we won't be able to find the
>>> reason to the corruption we should ask NDIS to reset the device in
>>> order to make sure we are in a constant state.
>>>
>>> Thanks
>>> Tzachi
>>>
>>>> -----Original Message-----
>>>> From: Alex Estrin [mailto:alex.estrin at qlogic.com]
>>>> Sent: Friday, November 07, 2008 1:56 AM
>>>> To: Fab Tillier; Tzachi Dar; ofw at lists.openfabrics.org
>>>> Subject: RE: [ofw] RE: Patch: [ipoib] Make sure that the dlid is
>>>> zero if it isnot in the list.
>>>>
>>>> I have some thoughts of possible reason where stale endpoint can be
>>>> missed:
>>>>
>>>> Looking into ipoib_port.c (rev. 1737) __recv_get_endpts() @ line 1873:
>>>>
>>>>   if( *pp_src && !ipoib_is_voltaire_router_gid(
>> &(*pp_src)->dgid ) &&
>>>>           (*pp_src)->qpn != p_wc->recv.ud.remote_qp ) { /* Update the
>>>>           QPN for the endpoint. */ .......... (*pp_src)->qpn =
>>>>           p_wc->recv.ud.remote_qp;
>>>>   }
>>>>  Then later in  __recv_arp() @ line 2425 following code suppose to
>>>> cleanup stale endpoint, But it won't happend because earlier QPN was
>>>> "updated":
>>>>
>>>>           else if( (*pp_src)->qpn != p_wc->recv.ud.remote_qp )
>>>>           {
>>>>                   /* Out of date!  Destroy the endpoint and replace
>>>>                   it. */ __endpt_mgr_remove( p_port, *pp_src );
>>>>                   *pp_src = NULL;
>>>>           }
>>>>
>>>>
>>>> Did I miss anything?
>>>> Ideas why QPN update was put there?
>>>>
>>>> Thanks,
>>>> Alex.
>>>>
>>>>
>>>>> -----Original Message----- From: ofw-bounces at lists.openfabrics.org
>>>>> [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Fab Tillier
>>>>> Sent: Thursday, November 06, 2008 6:01 PM To: Tzachi Dar;
>>>>> ofw at lists.openfabrics.org Subject: [ofw] RE: Patch: [ipoib] Make
>>>>> sure that the dlid is zero if it isnot in the list.
>>>>>
>>>>>> The real issue is what else should we done. I'm afraid that things
>>>>>> will not work as this endpoint has no dlid. My ideas are:
>>>>>>
>>>>>> 1) Remove this endpoint from the list. 2) Remove the other endpoint
>>>>>> from the list (the one that has the same dlid) 3) Force a reset by
>>>>>> NDIS, to start things all over again.
>>>>>  So there's already an endpoint for that multicast group? Is it
>>>>> valid or stale?  How come the new and existing endpoints don't have
>>>>> the same MAC/GID?
>>>>>
>>>>> Why did the dlid change if the MAC/GID is the same?
>>>>>
>>>>> -Fab
>>>>> _______________________________________________
>>>>> ofw mailing list
>>>>> ofw at lists.openfabrics.org
>>>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
>>>>>
>>>>
>>>
>>
> _______________________________________________
> ofw mailing list
> ofw at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw



More information about the ofw mailing list