[ofw] RE: Patch: [ipoib] Make sure that the dlid is zero if it isnot in the list.
Alex Estrin
alex.estrin at qlogic.com
Fri Nov 7 04:19:05 PST 2008
Why multicast endpoint should be inserted on lid list at all?
Thanks,
Alex.
> -----Original Message-----
> From: Tzachi Dar [mailto:tzachid at mellanox.co.il]
> Sent: Thursday, November 06, 2008 7:25 PM
> To: Alex Estrin; Fab Tillier; ofw at lists.openfabrics.org
> Subject: RE: [ofw] RE: Patch: [ipoib] Make sure that the dlid
> is zero if it isnot in the list.
>
> You both have good questions that I don't have answers yet.
> I'll try to
> think of it more tomorrow.
>
> Alex, please note that at least in the assert that I have received the
> problem has happened because of multicast_cb which means that arp was
> not related there. (you might be pointing to another problem).
>
> In any case, the more I think of it, if we won't be able to find the
> reason to the corruption we should ask NDIS to reset the
> device in order
> to make sure we are in a constant state.
>
> Thanks
> Tzachi
>
> > -----Original Message-----
> > From: Alex Estrin [mailto:alex.estrin at qlogic.com]
> > Sent: Friday, November 07, 2008 1:56 AM
> > To: Fab Tillier; Tzachi Dar; ofw at lists.openfabrics.org
> > Subject: RE: [ofw] RE: Patch: [ipoib] Make sure that the dlid
> > is zero if it isnot in the list.
> >
> > I have some thoughts of possible reason where stale endpoint can be
> > missed:
> >
> > Looking into ipoib_port.c (rev. 1737) __recv_get_endpts() @
> line 1873:
> >
> > if( *pp_src && !ipoib_is_voltaire_router_gid(
> > &(*pp_src)->dgid ) &&
> > (*pp_src)->qpn != p_wc->recv.ud.remote_qp )
> > {
> > /* Update the QPN for the endpoint. */
> > ..........
> > (*pp_src)->qpn = p_wc->recv.ud.remote_qp;
> > }
> >
> > Then later in __recv_arp() @ line 2425 following code
> > suppose to cleanup stale endpoint, But it won't happend
> > because earlier QPN was "updated":
> >
> > else if( (*pp_src)->qpn != p_wc->recv.ud.remote_qp )
> > {
> > /* Out of date! Destroy the endpoint
> > and replace it. */
> > __endpt_mgr_remove( p_port, *pp_src );
> > *pp_src = NULL;
> > }
> >
> >
> > Did I miss anything?
> > Ideas why QPN update was put there?
> >
> > Thanks,
> > Alex.
> >
> >
> > > -----Original Message-----
> > > From: ofw-bounces at lists.openfabrics.org
> > > [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of
> Fab Tillier
> > > Sent: Thursday, November 06, 2008 6:01 PM
> > > To: Tzachi Dar; ofw at lists.openfabrics.org
> > > Subject: [ofw] RE: Patch: [ipoib] Make sure that the dlid
> > is zero if
> > > it isnot in the list.
> > >
> > > >The real issue is what else should we done. I'm afraid that
> > > things will not
> > > >work as this endpoint has no dlid.
> > > >My ideas are:
> > > >
> > > >1) Remove this endpoint from the list.
> > > >2) Remove the other endpoint from the list (the one that
> > has the same
> > > >dlid)
> > > >3) Force a reset by NDIS, to start things all over again.
> > >
> > > So there's already an endpoint for that multicast group?
> > Is it valid
> > > or stale? How come the new and existing endpoints don't
> > have the same
> > > MAC/GID?
> > >
> > > Why did the dlid change if the MAC/GID is the same?
> > >
> > > -Fab
> > > _______________________________________________
> > > ofw mailing list
> > > ofw at lists.openfabrics.org
> > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
> > >
> >
>
More information about the ofw
mailing list