[Openib-windows] LID change event

Fabian Tillier ftillier at silverstorm.com
Thu Jul 6 11:56:44 PDT 2006


Hi Yossi,

On 7/4/06, Yossi Leybovich <sleybo at mellanox.co.il> wrote:
> Hi
>
> We found more cases that IPoIB discover duplicate LID in its endptlist
> (even after we clean the LID list in ipoib_reset_all)
> This can be cause from old packets in the network (recv packets create
> p_src endpnt if it does not exist and the packet can carry the old LID)
> I think that this patch reduce the possibility of getting duplicate
> entries in the LID.
> It insert to the LIDs list only when the path record query is back (with
> the av).

Not inserting into the LID map until the AV is created means that we
won't ever report unicast packets until we've tried to send to that
node.  I don't know how big of an issue this is, since most
communication start with an ARP exchange.

However, there are cases where discarding unicast traffic like this is
the wrong thing to do.  Think of two systems, A and B.  B resolves A's
IP address via ARP (A responded, so all is well).  A now loses its
link, but B doesn't - this flushes all of A's endpoint entries since
the port went down - all endpoints lose their LID assignment.  B now
tries to send unicast packets to A - it doesn't need to ARP again
since it just did.  The packets, when received by A, fail any lookup
by LID, and are discarded.

> More over same as we create endpt entry in recv_arp (with LID 0 because
> source LID may not be the original initiator) we should do that  in
> recv_get_endpt function as well and wait to the LID from the path record
> query.

Looking at it, I think recv_arp is wrong, and should include the LID.
Otherwise further unicast traffic will be discarded.

> I also add assert to check for duplication in the path_record_cb
>
> Another option is:
> To check in each insertion to the LIDs list if the LID already exist in
> the list , if yes remove the entry from the LIDs list and zero the LID
> field of the endpt struct.

I think the right thing to do is to remove the old entry, and replace
it with the new anytime the LID changes.  We can't require every
packet to include the GRH, as the IPoIB draft states that
implementations must handle receiving packets without a GRH.

I have to think about this a little more - I don't know what to do
with the "old" endpoint if a new one is being inserted with a
duplicate LID.  Do we just set its LID to zero, or do we remove it all
together?

- Fab




More information about the ofw mailing list