[ofa-general] Re: IPoIB path caching

Or Gerlitz ogerlitz at voltaire.com
Sun Jul 29 01:27:32 PDT 2007


Sean Hefty wrote:
> Administrators can enable or disable the cache.  I don't believe that 
> individual applications should be able to override the administrator, 
> nor do I think we gain anything by having per application settings. This 
> is similar to exposing to applications whether they want to use cached 
> ARP information every time they connect.

Applications --can-- delete the network stack neighbour before doing 
this or that action.

>> For example, I think it would be correct for IB block and file I/O 
>> ULPs (iSER, SRP, Lustre, rNFS, etc) to request non cached PR, as their 
>> connecting model is not all-to-all but rather n-to-m (n clients to m 
>> servers with m << n), the connections are long-lived (hours, days, 
>> weeks, more) and a connection failure as of PR caching does not seem 
>> acceptable.

> I believe a better solution is for everyone to use cached records, if 
> they exist, with a feedback mechanism from the CM that removes paths on 
> a connection failure or path migration event.

That's an interesting point. What's the conceptual difference between CM 
connection failure caused as of "wrong" PR to failure of --unicast-- ARP 
probe initiated by the network stack? CM feedback to the local sa seems 
a correct approach for me, however, I don't see the equivalent for UD 
communication.

> With all to all connections over the rdma cm, the first thing that needs 
> to be done is resolve the remote addresses to GIDs.  This causes an ARP 
> storm, followed by an SA storm caused by IPoIB, followed by a second SA 
> storm caused by the rdma cm.  For scalability, we need to remove both of 
> these SA storms, not just the second.  We don't see the first SA storm 
> today because IPoIB caches PRs.  Let's not add it.  Restricting caching 
> to the rdma cm, but removing it from IPoIB leaves us with the same 
> issues that we have today.

Again, typical I/O client-server scheme is n-to-m where m is small (1,2 
  say up to few tens). The PRs are needed by IPoIB only at the 
--passive-- side which sends the unicast ARP reply. So when n=1024 and 
m=4 the SA would need to serve 4096 PRs which is about one fourth of the 
queries/second rate you have reported on earlier threads on the matter.

Or.




More information about the general mailing list