[ofa-general] Re: [Query] ib add path record cache
Devesh Sharma
devesh28 at gmail.com
Fri May 25 06:52:59 PDT 2007
On 24 May 2007 11:30:24 -0400, Hal Rosenstock <halr at voltaire.com> wrote:
> On Thu, 2007-05-24 at 08:22, Devesh Sharma wrote:
> > On 23 May 2007 10:35:13 -0400, Hal Rosenstock <halr at voltaire.com> wrote:
> > > On Wed, 2007-05-23 at 10:27, Devesh Sharma wrote:
> > > > On 21 May 2007 13:52:11 -0400, Hal Rosenstock <halr at voltaire.com> wrote:
> > > > > On Mon, 2007-05-21 at 01:58, Devesh Sharma wrote:
> > > > > > On 18 May 2007 06:21:05 -0400, Hal Rosenstock <halr at voltaire.com> wrote:
> > > > > > > On Thu, 2007-05-17 at 08:28, Devesh Sharma wrote:
> > > > > > > > On 17 May 2007 06:42:16 -0400, Hal Rosenstock <halr at voltaire.com> wrote:
> > > > > > > > > On Thu, 2007-05-17 at 01:21, Devesh Sharma wrote:
> > > > > > > > > > On 5/17/07, Sean Hefty <mshefty at ichips.intel.com> wrote:
> > > > > > > > > > > > But initially this will generate a packet for each path, while sys
> > > > > > > > > > > > admin knows that path is there and he can hard-code the entries for
> > > > > > > > > > > > it. Other thing is that why Admin will care about creating such record
> > > > > > > > > > > > while SA is itself taking care, right?
> > > > > > > > > > >
> > > > > > > > > > > In your original message you asked about adding 'dummy entries' to the
> > > > > > > > > > > cache. I agree that pre-loading the cache can be useful. What I still
> > > > > > > > > > > am not understanding is the reasoning for adding 'dummy entries'. By
> > > > > > > > > > > 'dummy entries', I've been assuming that these are invalid path records,
> > > > > > > > > > > but maybe that's not what you meant.
> > > > > > > > > > Ok if "dummy entries" word as such has created confusion then I am
> > > > > > > > > > sorry for that, But with that I mean that, those are valid path
> > > > > > > > > > records which Administrator knows in advance and while loading the
> > > > > > > > > > module,
> > > > > > > > >
> > > > > > > > > How does the admin know they are valid ?
> > > > > > > > Depending on the initial application runs, some trusted PRs can be generated.
> > > > > > >
> > > > > > > What do initial application runs have to do with this ?
> > > > > > My understanding is that, once the cluster is UP, and if between Node
> > > > > > A and Node B there is only one path,
> > > > >
> > > > > So this is a feature for such one path subnets. I wonder what percentage
> > > > > of deployed subnets fits this case.
> > > > You never know, It may be used for debugging also.
> > >
> > > I still don't have a good feel for how common/generally useful this will
> > > really be.
> > >
> > > > > > then, SA query always going to return same values in PR.
> > > > >
> > > > > If subnet topology is changed, these PRs might change. There are other
> > > > > cases where they change too.
> > > > Not sure about it...some suggestion?
> > > > >
> > > > > > On this basis Initial application runs will generate PRs,
> > > > >
> > > > > That's what confused me before (Applications don't generate PRs but
> > > > > rather request them.) but I think I see what you mean now.
> > > > Ok
> > > > >
> > > > > > these PRs can be saved in some file, and can be loaded
> > > > > > when cache_module comes in.
> > > > > > >
> > > > > > > > >Are they somehow preconfigured at the SM ?
> > > > > > > > I am not sure about SM has any such provision?
> > > > > > >
> > > > > > > Not that I'm aware of.
> > > > > > Ok, So, currently no such support is there in SM?
> > > > >
> > > > > I can speak definitively for OpenSM and there is no such support. As to
> > > > > the vendor SMs, I don't think so but don't know for absolute certainty.
> > > > > Someone can correct me if I'm wrong but I wouldn't assume no response
> > > > > means correctness as some may not be listening nor want to respond as to
> > > > > "value added" vendor specific features.
> > > > What is the issue if OpenSM provides this?
> > >
> > > I'm not following you. What does/should OpenSM provide ? OpenIB works in
> > > configurations with other SMs.
> > I am talking about pre-configuring PRs in OpenSM DB.
>
> How does that help ? Why would PRs need to be preconfigured at the SM ?
> Do you mean preconfigure the routing tables (and generate the PRs from
> that) ? What problem is being solved on the SM side ?
I just queried out of curiosity......nothing special.:)
>
> > > > > > > > Also not sure about the
> > > > > > > > role of SM in path resolving. I mean once node has initiated SA query,
> > > > > > > > whether SM has some database to reply SA or On the fly destination
> > > > > > > > node is contacted to get asked path recored?
> > > > > > >
> > > > > > > SMs can either calculate the SA PRs on the fly based on the routing
> > > > > > > algorithm in use and some other things or put them in a local database.
> > > > > > > This is up to that SM.
> > > > > > Ok
> > > > > > >
> > > > > > > Destination node is not contacted in the SA PR query process.
> > > > > > >
> > > > > > > > >Doesn't each SM have its own policy for generating valid PRs ?
> > > > > > > > Ultimately path record is in Path_Record object format, and SA cache
> > > > > > > > is going to store in a fixed manner, How generation policy matters?
> > > > > > >
> > > > > > > What if the local policy loaded does not agree with what the SM would
> > > > > > > generate for a particular PR ? One then gets a local error which will
> > > > > > > need to be tracked down. Not so easy IMO.
> > > > > > SM policies in a subnet to generate PRs, changes dynamically? at run time?
> > > > >
> > > > > The policy doesn't change dynamically but the data to be returned in the
> > > > > SA PR response might.
> > > > >
> > > > > > if Not then depending on the local SM policy static PR can be
> > > > > > generated to load initially.
> > > > >
> > > > > Just as one question related to this, how would link failures be handled
> > > > > ? There are others.
> > > > Its just a matter of avoiding initial PR query packets by loading the
> > > > cache with static PRs.....Later on cache module will function in
> > > > normal fashion. I expect, initially every thing will come up in a
> > > > trusted cluster.
> > >
> > > So you're saying the cache would still react to GIDs out and in service,
> > > right ?
> > I am not about what GIDs in out service....
>
> Why not ?
Actually it was a typing mistake....I am trying to say that I am not
sure about what GID out and in service is.
>
> > but what I mean to say is,
> > Once sa_cache is programmed with some static PRs....it will avoid
> > initial cache_update step and after first time out normal
> > update_cache() will be initiated using SA MADs.
>
> How would the client know what PRs to request when that timeout first
> occurs ? There's no get all except these semantics. If it is all PRs,
> what does that save ?
I think my statement has again confused you.....sorry my falt.."and
after first time out normal update_cache() will be initiated using SA
MADs." I mean to say, after first time out....only the requested PR
will be resolved....not all.
>
> > > If the cache is loaded from a file, does it bypass querying the SA
> > > initially for PRs ?
> > Yes It will, and hence reduce the initial SA traffic generated on a
> > big cluster...just imagin, the cluster is quite big and every node is
> > trying to build its cache initially. It will create large burst of SA
> > packets.
> > >If that is the case, then the file is required to be
> > > the full set of PRs for this node otherwise there would be incomplete
> > > connectivity.
> > Yes, correct, Generating these PRs is the next issue which I want to
> > discuss. may be this can be done by Admin on every node using the
> > read() entry point provided by char_dev interface of sa_cache module.
> > read entry point will simple extract PRs from cache itself.
> >
> > Incomplete connectivity will be till first PR is requested for that
> > destination, Because if its a cache miss, any how application is going
> > to initiate a ib_sa_get_path_rec() and resolved PR will be added in
> > cache for future reference.
>
> OK then this becomes an on demand model for those destnations (at least
> initially).
By "on demand" do you mean.....normal cluster without cache? if yes
than it will be on demand PR resolve model for those incomplete paths.
>
> -- Hal
>
> > > -- Hal
> > >
> > > > > > > > CMIIW. Also I am assuming a homogeneous cluster where certain
> > > > > > > > parameters can be assumed to be same always.
> > > > > > >
> > > > > > > and always in agreement with what the SM would return ? For example,
> > > > > > yes
> > > > > > > what happens when a link goes down and the end node is no longer
> > > > > > > reachable ?
> > > > > > If node is not reachable then, after first timeout of sa_cache, that
> > > > > > entry will be removed from cache.
> > > > >
> > > > > OK; that's another aspect to add into this feature. I don't think that
> > > > > is currently done. I think there would need to be an API added to do
> > > > > this.
> > > > Yes, this has been discussed with Sean, we can add one char_dev
> > > > interface to the existing sa_cache module implementation, Write entry
> > > > point will generate a SA_PR_response packet and this packet will be
> > > > passed to update_cache() function.
> > > >
> > > > Also we need to remove the initial schedule_update() call in the
> > > > add_one() function.
> > > > One user command is also required to read from user file and write
> > > > onto this device.
> > > > >
> > > > > -- Hal
> > > > >
> > > > > > > > >are these from a live SM and just loaded "out of band" to
> > > > > > > > bypass/preclude the SA PR >mechanism ?
> > > > > > > > may be
> > > > > > >
> > > > > > > Even if they are, there is still the changes in the subnet issue.
> > > > > > >
> > > > > > > -- Hal
> > > > > > >
> > > > > > > > > -- Hal
> > > > > > > > >
> > > > > > > > > > Admin is loading this info in the cache with user command.
> > > > > > > > > > >
> > > > > > > > > > > > Another point I want to know is,
> > > > > > > > > > > > When local_sa_cache module will be inserted? After SM comes up or
> > > > > > > > > > > > Before SM comes up?
> > > > > > > > > > >
> > > > > > > > > > > It can occur either way. There is no restriction. The cache responds
> > > > > > > > > > > to port up and GID in/out of service events to update itself.
> > > > > > > > > > Do you mean cache module will start building cache only after Port is UP?
> > > > > > > > > > >
> > > > > > > > > > > > If Its inserted before SM is coming up (I am assuming SM is running on
> > > > > > > > > > > > some node not on switch) then First Forced schedule_update() is
> > > > > > > > > > > > waisted, and for the first application presence of cache is
> > > > > > > > > > > > meaningless. Why not to keep cache effective right from the start?
> > > > > > > > > > >
> > > > > > > > > > > Pre-loading the cache with path records doesn't guarantee that those
> > > > > > > > > > > paths are usable. If the SM has not come up, then the path records will
> > > > > > > > > > > be unusable until the SM configures the subnet, plus there's no
> > > > > > > > > > > guarantee that the remote endpoints specified by the paths are running.
> > > > > > > > > > You mean there is no guarantee that even if SM is UP and we have some
> > > > > > > > > > hard coded entries of path record corresponding to some node X, we are
> > > > > > > > > > not sure that node X has actually come up or not? In that case
> > > > > > > > > > actually that path resolving should fail if node has not come up, but
> > > > > > > > > > with the hard coding still path will be resolved?
> > > > > > > > > > >
> > > > > > > > > > > The main benefit I see to pre-loading the cache is to avoid SA storms
> > > > > > > > > > > when booting a large cluster.
> > > > > > > > > > that's true. Also cache will get valid entries only if network is
> > > > > > > > > > configured by SM otherwise every node SA will, possibly, drop SA
> > > > > > > > > > packets.
> > > > > > > > > > >
> > > > > > > > > > > - Sean
> > > > > > > > > > >
> > > > > > > > > > _______________________________________________
> > > > > > > > > > general mailing list
> > > > > > > > > > general at lists.openfabrics.org
> > > > > > > > > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> > > > > > > > > >
> > > > > > > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > > >
> > > > >
> > > > >
> > >
> > >
>
>
More information about the general
mailing list