[ofa-general] Re: [Query] ib add path record cache
Devesh Sharma
devesh28 at gmail.com
Thu May 24 05:22:16 PDT 2007
On 23 May 2007 10:35:13 -0400, Hal Rosenstock <halr at voltaire.com> wrote:
> On Wed, 2007-05-23 at 10:27, Devesh Sharma wrote:
> > On 21 May 2007 13:52:11 -0400, Hal Rosenstock <halr at voltaire.com> wrote:
> > > On Mon, 2007-05-21 at 01:58, Devesh Sharma wrote:
> > > > On 18 May 2007 06:21:05 -0400, Hal Rosenstock <halr at voltaire.com> wrote:
> > > > > On Thu, 2007-05-17 at 08:28, Devesh Sharma wrote:
> > > > > > On 17 May 2007 06:42:16 -0400, Hal Rosenstock <halr at voltaire.com> wrote:
> > > > > > > On Thu, 2007-05-17 at 01:21, Devesh Sharma wrote:
> > > > > > > > On 5/17/07, Sean Hefty <mshefty at ichips.intel.com> wrote:
> > > > > > > > > > But initially this will generate a packet for each path, while sys
> > > > > > > > > > admin knows that path is there and he can hard-code the entries for
> > > > > > > > > > it. Other thing is that why Admin will care about creating such record
> > > > > > > > > > while SA is itself taking care, right?
> > > > > > > > >
> > > > > > > > > In your original message you asked about adding 'dummy entries' to the
> > > > > > > > > cache. I agree that pre-loading the cache can be useful. What I still
> > > > > > > > > am not understanding is the reasoning for adding 'dummy entries'. By
> > > > > > > > > 'dummy entries', I've been assuming that these are invalid path records,
> > > > > > > > > but maybe that's not what you meant.
> > > > > > > > Ok if "dummy entries" word as such has created confusion then I am
> > > > > > > > sorry for that, But with that I mean that, those are valid path
> > > > > > > > records which Administrator knows in advance and while loading the
> > > > > > > > module,
> > > > > > >
> > > > > > > How does the admin know they are valid ?
> > > > > > Depending on the initial application runs, some trusted PRs can be generated.
> > > > >
> > > > > What do initial application runs have to do with this ?
> > > > My understanding is that, once the cluster is UP, and if between Node
> > > > A and Node B there is only one path,
> > >
> > > So this is a feature for such one path subnets. I wonder what percentage
> > > of deployed subnets fits this case.
> > You never know, It may be used for debugging also.
>
> I still don't have a good feel for how common/generally useful this will
> really be.
>
> > > > then, SA query always going to return same values in PR.
> > >
> > > If subnet topology is changed, these PRs might change. There are other
> > > cases where they change too.
> > Not sure about it...some suggestion?
> > >
> > > > On this basis Initial application runs will generate PRs,
> > >
> > > That's what confused me before (Applications don't generate PRs but
> > > rather request them.) but I think I see what you mean now.
> > Ok
> > >
> > > > these PRs can be saved in some file, and can be loaded
> > > > when cache_module comes in.
> > > > >
> > > > > > >Are they somehow preconfigured at the SM ?
> > > > > > I am not sure about SM has any such provision?
> > > > >
> > > > > Not that I'm aware of.
> > > > Ok, So, currently no such support is there in SM?
> > >
> > > I can speak definitively for OpenSM and there is no such support. As to
> > > the vendor SMs, I don't think so but don't know for absolute certainty.
> > > Someone can correct me if I'm wrong but I wouldn't assume no response
> > > means correctness as some may not be listening nor want to respond as to
> > > "value added" vendor specific features.
> > What is the issue if OpenSM provides this?
>
> I'm not following you. What does/should OpenSM provide ? OpenIB works in
> configurations with other SMs.
I am talking about pre-configuring PRs in OpenSM DB.
>
> > >
> > > > > > Also not sure about the
> > > > > > role of SM in path resolving. I mean once node has initiated SA query,
> > > > > > whether SM has some database to reply SA or On the fly destination
> > > > > > node is contacted to get asked path recored?
> > > > >
> > > > > SMs can either calculate the SA PRs on the fly based on the routing
> > > > > algorithm in use and some other things or put them in a local database.
> > > > > This is up to that SM.
> > > > Ok
> > > > >
> > > > > Destination node is not contacted in the SA PR query process.
> > > > >
> > > > > > >Doesn't each SM have its own policy for generating valid PRs ?
> > > > > > Ultimately path record is in Path_Record object format, and SA cache
> > > > > > is going to store in a fixed manner, How generation policy matters?
> > > > >
> > > > > What if the local policy loaded does not agree with what the SM would
> > > > > generate for a particular PR ? One then gets a local error which will
> > > > > need to be tracked down. Not so easy IMO.
> > > > SM policies in a subnet to generate PRs, changes dynamically? at run time?
> > >
> > > The policy doesn't change dynamically but the data to be returned in the
> > > SA PR response might.
> > >
> > > > if Not then depending on the local SM policy static PR can be
> > > > generated to load initially.
> > >
> > > Just as one question related to this, how would link failures be handled
> > > ? There are others.
> > Its just a matter of avoiding initial PR query packets by loading the
> > cache with static PRs.....Later on cache module will function in
> > normal fashion. I expect, initially every thing will come up in a
> > trusted cluster.
>
> So you're saying the cache would still react to GIDs out and in service,
> right ?
I am not about what GIDs in out service....but what I mean to say is,
Once sa_cache is programmed with some static PRs....it will avoid
initial cache_update step and after first time out normal
update_cache() will be initiated using SA MADs.
>
> If the cache is loaded from a file, does it bypass querying the SA
> initially for PRs ?
Yes It will, and hence reduce the initial SA traffic generated on a
big cluster...just imagin, the cluster is quite big and every node is
trying to build its cache initially. It will create large burst of SA
packets.
>If that is the case, then the file is required to be
> the full set of PRs for this node otherwise there would be incomplete
> connectivity.
Yes, correct, Generating these PRs is the next issue which I want to
discuss. may be this can be done by Admin on every node using the
read() entry point provided by char_dev interface of sa_cache module.
read entry point will simple extract PRs from cache itself.
Incomplete connectivity will be till first PR is requested for that
destination, Because if its a cache miss, any how application is going
to initiate a ib_sa_get_path_rec() and resolved PR will be added in
cache for future reference.
>
> -- Hal
>
> > > > > > CMIIW. Also I am assuming a homogeneous cluster where certain
> > > > > > parameters can be assumed to be same always.
> > > > >
> > > > > and always in agreement with what the SM would return ? For example,
> > > > yes
> > > > > what happens when a link goes down and the end node is no longer
> > > > > reachable ?
> > > > If node is not reachable then, after first timeout of sa_cache, that
> > > > entry will be removed from cache.
> > >
> > > OK; that's another aspect to add into this feature. I don't think that
> > > is currently done. I think there would need to be an API added to do
> > > this.
> > Yes, this has been discussed with Sean, we can add one char_dev
> > interface to the existing sa_cache module implementation, Write entry
> > point will generate a SA_PR_response packet and this packet will be
> > passed to update_cache() function.
> >
> > Also we need to remove the initial schedule_update() call in the
> > add_one() function.
> > One user command is also required to read from user file and write
> > onto this device.
> > >
> > > -- Hal
> > >
> > > > > > >are these from a live SM and just loaded "out of band" to
> > > > > > bypass/preclude the SA PR >mechanism ?
> > > > > > may be
> > > > >
> > > > > Even if they are, there is still the changes in the subnet issue.
> > > > >
> > > > > -- Hal
> > > > >
> > > > > > > -- Hal
> > > > > > >
> > > > > > > > Admin is loading this info in the cache with user command.
> > > > > > > > >
> > > > > > > > > > Another point I want to know is,
> > > > > > > > > > When local_sa_cache module will be inserted? After SM comes up or
> > > > > > > > > > Before SM comes up?
> > > > > > > > >
> > > > > > > > > It can occur either way. There is no restriction. The cache responds
> > > > > > > > > to port up and GID in/out of service events to update itself.
> > > > > > > > Do you mean cache module will start building cache only after Port is UP?
> > > > > > > > >
> > > > > > > > > > If Its inserted before SM is coming up (I am assuming SM is running on
> > > > > > > > > > some node not on switch) then First Forced schedule_update() is
> > > > > > > > > > waisted, and for the first application presence of cache is
> > > > > > > > > > meaningless. Why not to keep cache effective right from the start?
> > > > > > > > >
> > > > > > > > > Pre-loading the cache with path records doesn't guarantee that those
> > > > > > > > > paths are usable. If the SM has not come up, then the path records will
> > > > > > > > > be unusable until the SM configures the subnet, plus there's no
> > > > > > > > > guarantee that the remote endpoints specified by the paths are running.
> > > > > > > > You mean there is no guarantee that even if SM is UP and we have some
> > > > > > > > hard coded entries of path record corresponding to some node X, we are
> > > > > > > > not sure that node X has actually come up or not? In that case
> > > > > > > > actually that path resolving should fail if node has not come up, but
> > > > > > > > with the hard coding still path will be resolved?
> > > > > > > > >
> > > > > > > > > The main benefit I see to pre-loading the cache is to avoid SA storms
> > > > > > > > > when booting a large cluster.
> > > > > > > > that's true. Also cache will get valid entries only if network is
> > > > > > > > configured by SM otherwise every node SA will, possibly, drop SA
> > > > > > > > packets.
> > > > > > > > >
> > > > > > > > > - Sean
> > > > > > > > >
> > > > > > > > _______________________________________________
> > > > > > > > general mailing list
> > > > > > > > general at lists.openfabrics.org
> > > > > > > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> > > > > > > >
> > > > > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> > > > > > >
> > > > > > >
> > > > >
> > > > >
> > >
> > >
>
>
More information about the general
mailing list