***SPAM*** Re: [ofa-general] ***SPAM*** Re: [PATCHv2] opensm/PerfMgr: Better redirection support
Hal Rosenstock
hal.rosenstock at gmail.com
Mon Apr 20 07:19:30 PDT 2009
On Fri, Apr 17, 2009 at 10:57 AM, Sasha Khapyorsky <sashak at voltaire.com> wrote:
> On 09:34 Thu 16 Apr , Hal Rosenstock wrote:
>> >
>> > Yes, and you can use lid value as such flag - just simpler.
>>
>> When GID redirection is specified by client, LID must be 0 so I don't see this.
>
> 1. GID redirection is not implemented in this patch.
> 2. In any case you will need to resolve LID value (using GID) in order
> to send MAD.
> So LID = 0 can be used as invalid redirection data flag. But this was
> a minor comment.
Yes, this is a minor tradeoff. One loses some information by
overloading. One example for the GID redirection case, SA PR query in
progress v. bad redirection.
>> > My point was different - to separate redirection related data from main
>> > flow.
>>
>> I'm still not sure what you mean by this. Encapsulate the redirection
>> data better so it is obtained by some potentially common routine ?
>
> Yes. And also to not use "fake" redirection fields (specifically pkey_ix)
> in non-redirected flow - this is why I think you need 'port' structure.
The pkey index was needed before; it was just assumed to be 0 as
redirection (nor any real pkey support) was supported.
>> >> > PerfMgr is always running over discovered fabric so maybe local port
>> >> > number should be detected later at start of PerfMgr process cycle just
>> >> > using OpenSM DB.
>> >>
>> >> Why is that better than doing this at bind time of PerfMgr ?
>> >
>> > At least two reasons: faster and less code.
>>
>> Are you sure the OpenSM DB accesses will be faster than the vendor calls here ?
>
> Yes, it is direct memory read against opening and parsing many files
> (+ memory allocations, etc.).
Yes, it's orders of magnitude faster.
>> Is bind performance sensitive anyhow ?
>
> Not at all, but all what you need here is just local port number - and 40
> (or so) lines of the code (which is 80% duplicated with pkey validation)
> for doing this looks like overkill for me (not in sense of performance).
>
>> The performance comment is
>> clearly relevant to the main flow though.
>
> Sure, but there you just need to read a value.
>
>> >> > Also what about letting "chance" for port to refresh redirection info?
>> >>
>> >> What do you mean ?
>> >
>> > When port has invalid redirection data, should you care about attempting
>> > to refresh this?
>>
>> If the PMA gives bad redirection data (which BTW is noncompliant), it
>> seems likely to do this again so I'm not sure about the value of this.
>> Do you think that's a better thing to do ?
>
> I don't have a clear opinion (and so asked). Actually if I understood
> your code correctly this means that if some port once gets bad
> redirection data it will dropped from PerfMgr cycle forever, right?
This is an implementation decision and I chose not to query. The
invalid info can be cleared via the console which will allow this port
to be retried.
>> >> Redirection does not occur frequently.
>> >
>> > How could we know:)
>>
>> It's the current use case for PerfMgt.
>
> Let's suppose it happens just three times per one PerfMgr cycle -
> 3 > 1 anyway.
> Another important advantage is that in case when pkey tables are
> prepared *before* actual PerfMgr cycle and will not slow down querying
> itself.
>
> Another thought - could p_physp->pkeys be used for index
> detection/validation?
Yes, I was thinking that too when you said to switch the local port
determination over to the OpenSM DB.
>> > When OpenSM is in master mode it cannot change (PerfMgr is synchronized
>> > with heavy sweep).
>> >
>> > It is possible with standby OpenSM, so what - this single request will
>> > fail once.
>>
>> Some recovery for such failure would be needed.
>
> Not really - next PerfMgr cycle will fetch valid data.
>
>> Also, what about not active ?
>
> Same as standby (let's call it "non-master" modes).
>
>> >> > All above are not OpenSM errors, but wrong external data. I think it
>> >> > should be logged as VERBOSE messages.
>> >>
>> >> I agree it's wrong external data but it seems serious enough to me to
>> >> treat as an error.
>> >
>> > And some stupid port will be able to put OpenSM in endless error
>> > printing. I don't think it is a good idea.
>>
>> It would be a non compliant PMA which I would think we'd want to know
>> about sooner rather than later.
>
> If an admin want to care about this (and also about other such sort of
> things) he/she will turn verbosity "on".
But how does the admin even know that redirection is being used so is
needed to be enabled ? That assumes the admin knows which devices
require redirection.
>> >> Seems like some sort of configuration error to me if this is disabled
>> >> at the manager but the PMA wants to use it.
>> >
>> > PMA shouldn't dictate here.
>>
>> PMA does dictate redirection. Manager has no way to shut it off.
>
> But it should be able to ignore this (including "noisy" logging).
That's the tradeoff. Your choice leads to silent failures.
-- Hal
>> If
>> manager turns off it's handling of redirection, then it just doesn't
>> work (that port is inaccessible by the manager). This argues for the
>> default to be enabled. The current default is disabled since this code
>> was deemed experimental.
>
> Right, and it should be consistent with this (now default) setting.
>
>> >> > BTW, why to bother with verifying redirection info when redirection
>> >> > support is disabled anyway?
>> >>
>> >> I thought it was useful to know the redirection info was invalid
>> >> rather than getting the disabled notification and then enabling and
>> >> finding out.
>> >
>> > For PMAs debug purposes redirection support should be switched "on"
>> > obviously.
>>
>> Why do you say debug purposes ? Isn't it any purpose ?
>
> I meant PMA support + PMA debug.
> Sasha
>
More information about the general
mailing list