[ofa-general] Re: IPoIB path caching
Or Gerlitz
ogerlitz at voltaire.com
Sun Jul 29 01:27:32 PDT 2007
Sean Hefty wrote:
> Administrators can enable or disable the cache. I don't believe that
> individual applications should be able to override the administrator,
> nor do I think we gain anything by having per application settings. This
> is similar to exposing to applications whether they want to use cached
> ARP information every time they connect.
Applications --can-- delete the network stack neighbour before doing
this or that action.
>> For example, I think it would be correct for IB block and file I/O
>> ULPs (iSER, SRP, Lustre, rNFS, etc) to request non cached PR, as their
>> connecting model is not all-to-all but rather n-to-m (n clients to m
>> servers with m << n), the connections are long-lived (hours, days,
>> weeks, more) and a connection failure as of PR caching does not seem
>> acceptable.
> I believe a better solution is for everyone to use cached records, if
> they exist, with a feedback mechanism from the CM that removes paths on
> a connection failure or path migration event.
That's an interesting point. What's the conceptual difference between CM
connection failure caused as of "wrong" PR to failure of --unicast-- ARP
probe initiated by the network stack? CM feedback to the local sa seems
a correct approach for me, however, I don't see the equivalent for UD
communication.
> With all to all connections over the rdma cm, the first thing that needs
> to be done is resolve the remote addresses to GIDs. This causes an ARP
> storm, followed by an SA storm caused by IPoIB, followed by a second SA
> storm caused by the rdma cm. For scalability, we need to remove both of
> these SA storms, not just the second. We don't see the first SA storm
> today because IPoIB caches PRs. Let's not add it. Restricting caching
> to the rdma cm, but removing it from IPoIB leaves us with the same
> issues that we have today.
Again, typical I/O client-server scheme is n-to-m where m is small (1,2
say up to few tens). The PRs are needed by IPoIB only at the
--passive-- side which sends the unicast ARP reply. So when n=1024 and
m=4 the SA would need to serve 4096 PRs which is about one fourth of the
queries/second rate you have reported on earlier threads on the matter.
Or.
More information about the general
mailing list