[ofa-general] Re: IPoIB path caching

Sean Hefty mshefty at ichips.intel.com
Thu Jul 26 10:21:16 PDT 2007


> I thinking that we are making progress, starting to converge.
> 
> My suggestion is that if you put the PR caching code within the ib_sa 
> module, add a parameter for the ib_sa_path_rec_get() where the caller 
> specifies if it is willing to get cached PR or not. Also I suggest that 
>  rdma_resolve_route() should be also enhanced to have a similar param 
> such that even native IB based ULPs can ask for not cached info if they 
> want to.

I still believe that these should be separate policies.  Consider that 
the cache could have updated immediately before a PR lookup from IPoIB - 
perhaps in response to an SA event.

Administrators can enable or disable the cache.  I don't believe that 
individual applications should be able to override the administrator, 
nor do I think we gain anything by having per application settings. 
This is similar to exposing to applications whether they want to use 
cached ARP information every time they connect.

> For example, I think it would be correct for IB block and file I/O ULPs 
> (iSER, SRP, Lustre, rNFS, etc) to request non cached PR, as their 
> connecting model is not all-to-all but rather n-to-m (n clients to m 
> servers with m << n), the connections are long-lived (hours, days, 
> weeks, more) and a connection failure as of PR caching does not seem 
> acceptable.

I believe a better solution is for everyone to use cached records, if 
they exist, with a feedback mechanism from the CM that removes paths on 
a connection failure or path migration event.

With all to all connections over the rdma cm, the first thing that needs 
to be done is resolve the remote addresses to GIDs.  This causes an ARP 
storm, followed by an SA storm caused by IPoIB, followed by a second SA 
storm caused by the rdma cm.  For scalability, we need to remove both of 
these SA storms, not just the second.  We don't see the first SA storm 
today because IPoIB caches PRs.  Let's not add it.  Restricting caching 
to the rdma cm, but removing it from IPoIB leaves us with the same 
issues that we have today.

- Sean



More information about the general mailing list