[ofa-general] Re: IPoIB path caching

Sean Hefty sean.hefty at intel.com
Mon Jul 23 12:39:09 PDT 2007


>If kernel caching is used it must be viewed as authorative and kept
>current through some kind of external mechanism. Something like
>re-doing the big GetTable query prior to starting a job is a fine
>interm way to do this. Ideally updating the kernel sa cache would also
>push updated data into the neighbor AH structures as appropriate. Then
>there is one single source of PR data, one source of IP -> GID
>mapppings, etc.

IB does not define anything to standardize distributing SA data.  The proposed
solution works with any SM, and supports SA events to the degree that they've
been defined.

The local SA does not define vendor-specific SA extensions, though such
extensions are supported at a high level by manual cache refreshes.  Someone
could provide finer control to refresh specific cache entries if needed and
used.

>These problems are just an unavoidable part of trying to use caching -
>build the mechanism to support coherent replication and just deal with
>these downsides <shrug>. Sean is basically doing non-coherent
>replication today with his big GetTable query and that sounds like
>what is speeding things up, not caching indivudal PRs.

The caching provides the speedup on the client side.  The GetTable provides the
scalability on the SA side.

Whether we cache PR or PR data in the form of AHs, caching is in use and
required by the software today.  No one would suggest that IPoIB issue a PR
query per packet.

I feel that we're trying to come up with the ideal solution at the start.  Let's
start with what we have today and expand.  Currently IPoIB caches PR data, does
not share it, and doesn't update it.  The local SA collects data more
efficiently*, shares the data, and provides ways for updating it.  It is
refreshed in response to specific events, and scripts could be used to refresh
the cache periodically.

* If communication is only to a few nodes in a large cluster, then multiple Get
queries may be more efficient than using GetTable.  The local SA could be
expanded to cache query responses, rather than issuing it's own in this case.

>Well, any time your renumber your network you will get burned and have
>to restart ipoib on every node with the way things are
>today. Something like a SM upgrade or changing to a new vendor SM, or
>increasing LMC could do this to you.

The local SA responds to these types of SA changes by refreshing the cache.

>Erm, any SA that returns a PR that is invalid in the network outside
>the time the network is being updated is seriously busted, IMHO.

The SA isn't guaranteed to know all links that are down at the time it returns a
PR.  There's a delay between when a path becomes unusable, and when the SA
detects it.  In fact, an end node could detect that a path is unusable before
the SA does, which could be the reason for it requesting a new path.  The SA
cannot sweep the fabric looking for changes before responding to every PR query.

- Sean



More information about the general mailing list