[Openib-windows] LID change event

Yossi Leybovich sleybo at mellanox.co.il
Fri Jul 7 00:05:14 PDT 2006


 

> -----Original Message-----
> From: ftillier.sst at gmail.com [mailto:ftillier.sst at gmail.com] 
> On Behalf Of Fabian Tillier
> Sent: Thursday, July 06, 2006 9:57 PM
> To: Yossi Leybovich
> Cc: openib-windows at openib.org
> Subject: Re: [Openib-windows] LID change event
> 
> Hi Yossi,
> 
> On 7/4/06, Yossi Leybovich <sleybo at mellanox.co.il> wrote:
> > Hi
> >
> > We found more cases that IPoIB discover duplicate LID in 
> its endptlist 
> > (even after we clean the LID list in ipoib_reset_all) This can be 
> > cause from old packets in the network (recv packets create p_src 
> > endpnt if it does not exist and the packet can carry the old LID) I 
> > think that this patch reduce the possibility of getting duplicate 
> > entries in the LID.
> > It insert to the LIDs list only when the path record query is back 
> > (with the av).
> 
> Not inserting into the LID map until the AV is created means 
> that we won't ever report unicast packets until we've tried 
> to send to that node.  I don't know how big of an issue this 
> is, since most communication start with an ARP exchange.
> 
> However, there are cases where discarding unicast traffic 
> like this is the wrong thing to do.  Think of two systems, A 
> and B.  B resolves A's IP address via ARP (A responded, so 
> all is well).  A now loses its link, but B doesn't - this 
> flushes all of A's endpoint entries since the port went down 
> - all endpoints lose their LID assignment.  B now tries to 
> send unicast packets to A - it doesn't need to ARP again 
> since it just did.  The packets, when received by A, fail any 
> lookup by LID, and are discarded.
> 

Isn't this what will happen if the SM will change A LID.
If A LID is changed by the SM after the link is up(I am not really sure
that the SM allowed to do that ), if B will try to send to the old LID
the packets will still be discarded.

> > More over same as we create endpt entry in recv_arp (with LID 0 
> > because source LID may not be the original initiator) we should do 
> > that  in recv_get_endpt function as well and wait to the 
> LID from the 
> > path record query.
> 
> Looking at it, I think recv_arp is wrong, and should include the LID.
> Otherwise further unicast traffic will be discarded.
> 
> > I also add assert to check for duplication in the path_record_cb
> >
> > Another option is:
> > To check in each insertion to the LIDs list if the LID 
> already exist 
> > in the list , if yes remove the entry from the LIDs list 
> and zero the 
> > LID field of the endpt struct.
> 
> I think the right thing to do is to remove the old entry, and 
> replace it with the new anytime the LID changes.  We can't 
> require every packet to include the GRH, as the IPoIB draft 
> states that implementations must handle receiving packets 
> without a GRH.
> 

This will also solve the cleanup I made when the SM changed(first part
of the patch)

> I have to think about this a little more - I don't know what 
> to do with the "old" endpoint if a new one is being inserted 
> with a duplicate LID.  Do we just set its LID to zero, or do 
> we remove it all together?

I think we should set the LID to 0 clear its av (if exist) and remove it
from the LIDs list. 
We should keep it in the MAC/GID list so that new sends to that
destination will issue pr query to resolve the LID and send the packet.

Any way we need to come up with something because running over windows
CCP 8 nodes cluster get us to scenarios when LIDs changed and that hang
IPoIB.

> 
> - Fab
> 
> 
> 




More information about the ofw mailing list