[ofa-general] Re: IPoIB path caching

Sean Hefty mshefty at ichips.intel.com
Tue Jul 24 09:29:12 PDT 2007


> Linux has a quite sophisticated mechanism to maintain / cache / probe / 
> invalidate / update the network stack L2 neighbour info.

Path records are not just L2 info.  They contain L4, L3, and L2 info 
together.

> For example, in the Voltaire gen1 stack we had an ib arp module which 
> was used by both IPoIB and native IB ULPs (SDP, iSER, Lustre, etc). This 
> module managed some sort of path cache, were IPoIB was always asking for 
> non-cached path and other ULPs were willing to get cached path.

IMO, using a cached AH is no different than using a cached path.  You're 
simply mapping the PR data into another structure.

We're ignoring the problem here, and that is that a centralized SA 
doesn't scale.  MPI stacks have largely ignored this problem by simply 
not doing path record queries.  Path information is often hard-coded, 
with QPN data exchanged out of band over sockets (often over Ethernet).

We've seen problems running large MPI jobs without PR caching.  I know 
that Silverstorm/QLogic did as well.  And apparently Voltaire hit the 
same type of problem, since you added a caching module.  (Did Mellanox 
and Topspin/Cisco create PR caches as well?)  At least three companies 
working on IB came up with the same solution.  What is the objection to 
the current patch set?

- Sean



More information about the general mailing list