[ofw] RE: Patch: [ipoib] Make sure that the dlid is zero if it isnot in the list.

Tzachi Dar tzachid at mellanox.co.il
Thu Nov 6 16:25:11 PST 2008


You both have good questions that I don't have answers yet. I'll try to
think of it more tomorrow.

Alex, please note that at least in the assert that I have received the
problem has happened because of multicast_cb which means that arp was
not related there. (you might be pointing to another problem).

In any case, the more I think of it, if we won't be able to find the
reason to the corruption we should ask NDIS to reset the device in order
to make sure we are in a constant state.

Thanks
Tzachi

> -----Original Message-----
> From: Alex Estrin [mailto:alex.estrin at qlogic.com] 
> Sent: Friday, November 07, 2008 1:56 AM
> To: Fab Tillier; Tzachi Dar; ofw at lists.openfabrics.org
> Subject: RE: [ofw] RE: Patch: [ipoib] Make sure that the dlid 
> is zero if it isnot in the list.
> 
> I have some thoughts of possible reason where stale endpoint can be
> missed:
> 
> Looking into ipoib_port.c (rev. 1737) __recv_get_endpts() @ line 1873:
> 
> 	if( *pp_src && !ipoib_is_voltaire_router_gid( 
> &(*pp_src)->dgid ) &&
> 		(*pp_src)->qpn != p_wc->recv.ud.remote_qp )
> 	{
> 		/* Update the QPN for the endpoint. */
> 		..........
> 		(*pp_src)->qpn = p_wc->recv.ud.remote_qp; 
> 	} 
> 
> Then later in  __recv_arp() @ line 2425 following code 
> suppose to cleanup stale endpoint, But it won't happend 
> because earlier QPN was "updated":
> 
> 		else if( (*pp_src)->qpn != p_wc->recv.ud.remote_qp )
> 		{
> 			/* Out of date!  Destroy the endpoint 
> and replace it. */
> 			__endpt_mgr_remove( p_port, *pp_src );
> 			*pp_src = NULL;
> 		}
> 
> 
> Did I miss anything?
> Ideas why QPN update was put there?
> 
> Thanks,
> Alex.
> 
> 
> > -----Original Message-----
> > From: ofw-bounces at lists.openfabrics.org 
> > [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Fab Tillier
> > Sent: Thursday, November 06, 2008 6:01 PM
> > To: Tzachi Dar; ofw at lists.openfabrics.org
> > Subject: [ofw] RE: Patch: [ipoib] Make sure that the dlid 
> is zero if 
> > it isnot in the list.
> > 
> > >The real issue is what else should we done. I'm afraid that
> > things will not
> > >work as this endpoint has no dlid.
> > >My ideas are:
> > >
> > >1) Remove this endpoint from the list.
> > >2) Remove the other endpoint from the list (the one that 
> has the same
> > >dlid)
> > >3) Force a reset by NDIS, to start things all over again.
> > 
> > So there's already an endpoint for that multicast group?  
> Is it valid 
> > or stale?  How come the new and existing endpoints don't 
> have the same 
> > MAC/GID?
> > 
> > Why did the dlid change if the MAC/GID is the same?
> > 
> > -Fab
> > _______________________________________________
> > ofw mailing list
> > ofw at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
> > 
> 



More information about the ofw mailing list