[ofw] RE: Patch: [ipoib] Make sure that the dlid is zero if it isnot in the list.
Hal Rosenstock
halr at obsidianresearch.com
Mon Nov 10 16:04:08 PST 2008
Fab Tillier wrote:
> >From my memory (which is getting fainter as time goes on), the receive processing will look the source up by GID if a GRH is present, and if not by LID. Only unicast traffic should ever come in with a LID, and I believe all Windows IPoIB traffic goes out with the GRH set.
>
> So multicast endpoints shouldn't need to be in the LID map, though I can't imagine a reason why that would be problematic (unless multiple groups map to the same LID - though is that even possible?)
>
Yes, multiple MGIDs can share the same MLID (e.g. IPv6 solicited node
multicast consolidation).
-- Hal
> -Fab
>
>
>> -----Original Message-----
>> From: ofw-bounces at lists.openfabrics.org [mailto:ofw-
>> bounces at lists.openfabrics.org] On Behalf Of Tzachi Dar
>> Sent: Friday, November 07, 2008 6:21 AM
>> To: Alex Estrin; Fab Tillier; ofw at lists.openfabrics.org
>> Subject: RE: [ofw] RE: Patch: [ipoib] Make sure that the dlid is zero
>> if it isnot in the list.
>>
>> Indeed it seems that you have a good point here.
>>
>> The lid endpoints is only being used on the function
>> __endpt_mgr_get_by_lid(). This function is only called from the function
>> __recv_get_endpts.
>>
>> On the place that it is called around line 1860, there is the following
>> comment:
>>
>> /*
>> * Lookup the remote endpoint based on LID. Note that
>> only * unicast traffic can be LID routed. */
>> I'll try removing this lids and we will see if multicast continue to
>> work.
>>
>> Thanks
>> Tzachi
>>
>>
>>
>>> -----Original Message-----
>>> From: Alex Estrin [mailto:alex.estrin at qlogic.com]
>>> Sent: Friday, November 07, 2008 2:19 PM
>>> To: Tzachi Dar; Fab Tillier; ofw at lists.openfabrics.org
>>> Subject: RE: [ofw] RE: Patch: [ipoib] Make sure that the dlid is
>>> zero if it isnot in the list.
>>>
>>> Why multicast endpoint should be inserted on lid list at all?
>>>
>>> Thanks,
>>> Alex.
>>>
>>>
>>>> -----Original Message----- From: Tzachi Dar
>>>> [mailto:tzachid at mellanox.co.il] Sent: Thursday, November 06, 2008 7:25
>>>> PM To: Alex Estrin; Fab Tillier; ofw at lists.openfabrics.org Subject:
>>>> RE: [ofw] RE: Patch: [ipoib] Make sure that the dlid is zero if it
>>>> isnot in the list.
>>>>
>>>> You both have good questions that I don't have answers yet.
>>>> I'll try to
>>>> think of it more tomorrow.
>>>>
>>>> Alex, please note that at least in the assert that I have received the
>>>> problem has happened because of multicast_cb which means that arp was
>>>> not related there. (you might be pointing to another problem).
>>>>
>>>> In any case, the more I think of it, if we won't be able to find the
>>>> reason to the corruption we should ask NDIS to reset the device in
>>>> order to make sure we are in a constant state.
>>>>
>>>> Thanks
>>>> Tzachi
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Alex Estrin [mailto:alex.estrin at qlogic.com]
>>>>> Sent: Friday, November 07, 2008 1:56 AM
>>>>> To: Fab Tillier; Tzachi Dar; ofw at lists.openfabrics.org
>>>>> Subject: RE: [ofw] RE: Patch: [ipoib] Make sure that the dlid is
>>>>> zero if it isnot in the list.
>>>>>
>>>>> I have some thoughts of possible reason where stale endpoint can be
>>>>> missed:
>>>>>
>>>>> Looking into ipoib_port.c (rev. 1737) __recv_get_endpts() @ line 1873:
>>>>>
>>>>> if( *pp_src && !ipoib_is_voltaire_router_gid(
>>>>>
>>> &(*pp_src)->dgid ) &&
>>>
>>>>> (*pp_src)->qpn != p_wc->recv.ud.remote_qp ) { /* Update the
>>>>> QPN for the endpoint. */ .......... (*pp_src)->qpn =
>>>>> p_wc->recv.ud.remote_qp;
>>>>> }
>>>>> Then later in __recv_arp() @ line 2425 following code suppose to
>>>>> cleanup stale endpoint, But it won't happend because earlier QPN was
>>>>> "updated":
>>>>>
>>>>> else if( (*pp_src)->qpn != p_wc->recv.ud.remote_qp )
>>>>> {
>>>>> /* Out of date! Destroy the endpoint and replace
>>>>> it. */ __endpt_mgr_remove( p_port, *pp_src );
>>>>> *pp_src = NULL;
>>>>> }
>>>>>
>>>>>
>>>>> Did I miss anything?
>>>>> Ideas why QPN update was put there?
>>>>>
>>>>> Thanks,
>>>>> Alex.
>>>>>
>>>>>
>>>>>
>>>>>> -----Original Message----- From: ofw-bounces at lists.openfabrics.org
>>>>>> [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Fab Tillier
>>>>>> Sent: Thursday, November 06, 2008 6:01 PM To: Tzachi Dar;
>>>>>> ofw at lists.openfabrics.org Subject: [ofw] RE: Patch: [ipoib] Make
>>>>>> sure that the dlid is zero if it isnot in the list.
>>>>>>
>>>>>>
>>>>>>> The real issue is what else should we done. I'm afraid that things
>>>>>>> will not work as this endpoint has no dlid. My ideas are:
>>>>>>>
>>>>>>> 1) Remove this endpoint from the list. 2) Remove the other endpoint
>>>>>>> from the list (the one that has the same dlid) 3) Force a reset by
>>>>>>> NDIS, to start things all over again.
>>>>>>>
>>>>>> So there's already an endpoint for that multicast group? Is it
>>>>>> valid or stale? How come the new and existing endpoints don't have
>>>>>> the same MAC/GID?
>>>>>>
>>>>>> Why did the dlid change if the MAC/GID is the same?
>>>>>>
>>>>>> -Fab
>>>>>> _______________________________________________
>>>>>> ofw mailing list
>>>>>> ofw at lists.openfabrics.org
>>>>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
>>>>>>
>>>>>>
>> _______________________________________________
>> ofw mailing list
>> ofw at lists.openfabrics.org
>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
>>
> _______________________________________________
> ofw mailing list
> ofw at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
>
More information about the ofw
mailing list