[ofa-general] Re: IPoIB path caching
Or Gerlitz
ogerlitz at voltaire.com
Thu Jul 26 00:02:20 PDT 2007
Sean Hefty wrote:
>> I am willing to go with the local sa coming to serve large MPI jobs,
>> so you load as a prerequisite to spawning large all-to-all job.
>> But, I think the default for IPoIB needs to be usage of non cached PR.
> I think this ties together two things that aren't directly related. We
> have two network stacks running on top of each other here. Their
> policies should be separate.
The rational beyond my argument is that with IPoIB being an L2 packet
services for the network stack, when the network stack decides to renew
its L2 info for a neighbour (eg as it does not reply to direct probes)
if IPoIB uses cached IB info its doing something against what it was
asked to do.
> As an example, let's reverse this. Imagine instead that you implement
> IB over IP. Should an IB path refresh policy dictate that IP update its
> ARP tables?
in this settings (IB above IP), yes.
> Or, looking at it differently, do you prevent IP from
> updating the ARP table unless the IB stack asks for it?
no. If the lower stack wants to update its L2 info, its perfectly fine.
For example... the current IPoIB implementation flushes all its IB L2
info (address handles and PRs) when its gets IB event on the port
(up/down/lid-change/sm-lid-change/client-re-register/etc), this is very
much correct design.
> The policy for local PR caching should be set by an administrator. Now,
> we could provide a policy setting that ties it to the ARP cache, which
> sounds like a good idea. This will be less efficient in some use
> models, more efficient in others. But not all PRs belong to IPoIB, so
> we need a way to handle this. However, I don't believe that we have to
> always enforce such a policy, especially since the current stack doesn't
> have this behavior today.
I thinking that we are making progress, starting to converge.
My suggestion is that if you put the PR caching code within the ib_sa
module, add a parameter for the ib_sa_path_rec_get() where the caller
specifies if it is willing to get cached PR or not. Also I suggest that
rdma_resolve_route() should be also enhanced to have a similar param
such that even native IB based ULPs can ask for not cached info if they
want to.
For example, I think it would be correct for IB block and file I/O ULPs
(iSER, SRP, Lustre, rNFS, etc) to request non cached PR, as their
connecting model is not all-to-all but rather n-to-m (n clients to m
servers with m << n), the connections are long-lived (hours, days,
weeks, more) and a connection failure as of PR caching does not seem
acceptable.
Or.
More information about the general
mailing list