[ofw] RE: Patch: [ipoib] Make sure that the dlid is zero if it isnot in the list.

Tzachi Dar tzachid at mellanox.co.il
Fri Nov 7 06:20:42 PST 2008


Indeed it seems that you have a good point here.

The lid endpoints is only being used on the function
__endpt_mgr_get_by_lid().
This function is only called from the function __recv_get_endpts.

On the place that it is called around line 1860, there is the following
comment:

		/*
		 * Lookup the remote endpoint based on LID.  Note that
only
		 * unicast traffic can be LID routed.
		 */

I'll try removing this lids and we will see if multicast continue to
work.

Thanks
Tzachi


> -----Original Message-----
> From: Alex Estrin [mailto:alex.estrin at qlogic.com] 
> Sent: Friday, November 07, 2008 2:19 PM
> To: Tzachi Dar; Fab Tillier; ofw at lists.openfabrics.org
> Subject: RE: [ofw] RE: Patch: [ipoib] Make sure that the dlid 
> is zero if it isnot in the list.
> 
> Why multicast endpoint should be inserted on lid list at all?
> 
> Thanks,
> Alex.
> 
> > -----Original Message-----
> > From: Tzachi Dar [mailto:tzachid at mellanox.co.il]
> > Sent: Thursday, November 06, 2008 7:25 PM
> > To: Alex Estrin; Fab Tillier; ofw at lists.openfabrics.org
> > Subject: RE: [ofw] RE: Patch: [ipoib] Make sure that the 
> dlid is zero 
> > if it isnot in the list.
> > 
> > You both have good questions that I don't have answers yet. 
> > I'll try to
> > think of it more tomorrow.
> > 
> > Alex, please note that at least in the assert that I have 
> received the 
> > problem has happened because of multicast_cb which means 
> that arp was 
> > not related there. (you might be pointing to another problem).
> > 
> > In any case, the more I think of it, if we won't be able to 
> find the 
> > reason to the corruption we should ask NDIS to reset the device in 
> > order to make sure we are in a constant state.
> > 
> > Thanks
> > Tzachi
> > 
> > > -----Original Message-----
> > > From: Alex Estrin [mailto:alex.estrin at qlogic.com]
> > > Sent: Friday, November 07, 2008 1:56 AM
> > > To: Fab Tillier; Tzachi Dar; ofw at lists.openfabrics.org
> > > Subject: RE: [ofw] RE: Patch: [ipoib] Make sure that the dlid is 
> > > zero if it isnot in the list.
> > > 
> > > I have some thoughts of possible reason where stale 
> endpoint can be
> > > missed:
> > > 
> > > Looking into ipoib_port.c (rev. 1737) __recv_get_endpts() @
> > line 1873:
> > > 
> > > 	if( *pp_src && !ipoib_is_voltaire_router_gid( 
> &(*pp_src)->dgid ) &&
> > > 		(*pp_src)->qpn != p_wc->recv.ud.remote_qp )
> > > 	{
> > > 		/* Update the QPN for the endpoint. */
> > > 		..........
> > > 		(*pp_src)->qpn = p_wc->recv.ud.remote_qp; 
> > > 	}
> > > 
> > > Then later in  __recv_arp() @ line 2425 following code suppose to 
> > > cleanup stale endpoint, But it won't happend because 
> earlier QPN was 
> > > "updated":
> > > 
> > > 		else if( (*pp_src)->qpn != p_wc->recv.ud.remote_qp )
> > > 		{
> > > 			/* Out of date!  Destroy the endpoint 
> and replace it. */
> > > 			__endpt_mgr_remove( p_port, *pp_src );
> > > 			*pp_src = NULL;
> > > 		}
> > > 
> > > 
> > > Did I miss anything?
> > > Ideas why QPN update was put there?
> > > 
> > > Thanks,
> > > Alex.
> > > 
> > > 
> > > > -----Original Message-----
> > > > From: ofw-bounces at lists.openfabrics.org 
> > > > [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of 
> > Fab Tillier
> > > > Sent: Thursday, November 06, 2008 6:01 PM
> > > > To: Tzachi Dar; ofw at lists.openfabrics.org
> > > > Subject: [ofw] RE: Patch: [ipoib] Make sure that the dlid 
> > > is zero if 
> > > > it isnot in the list.
> > > > 
> > > > >The real issue is what else should we done. I'm afraid that
> > > > things will not
> > > > >work as this endpoint has no dlid.
> > > > >My ideas are:
> > > > >
> > > > >1) Remove this endpoint from the list.
> > > > >2) Remove the other endpoint from the list (the one that 
> > > has the same
> > > > >dlid)
> > > > >3) Force a reset by NDIS, to start things all over again.
> > > > 
> > > > So there's already an endpoint for that multicast group?  
> > > Is it valid 
> > > > or stale?  How come the new and existing endpoints don't 
> > > have the same 
> > > > MAC/GID?
> > > > 
> > > > Why did the dlid change if the MAC/GID is the same?
> > > > 
> > > > -Fab
> > > > _______________________________________________
> > > > ofw mailing list
> > > > ofw at lists.openfabrics.org
> > > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
> > > > 
> > > 
> > 
> 



More information about the ofw mailing list