***SPAM*** Re: [ofa-general] ***SPAM*** Re: [PATCHv2] opensm/PerfMgr: Better redirection support

Hal Rosenstock hal.rosenstock at gmail.com
Thu Apr 16 06:34:56 PDT 2009


On Wed, Apr 15, 2009 at 8:54 PM, Sasha Khapyorsky <sashak at voltaire.com> wrote:
> On 16:29 Wed 15 Apr     , Hal Rosenstock wrote:
>> >>
>> >> ??/* Redirection information */
>> >> ??typedef struct redir {
>> >> + ?? ?? boolean_t redirection;
>> >> + ?? ?? boolean_t invalid;
>> >
>> > Why using lid value != 0 is/was bad for redirection invalidation?
>>
>> b/c there are other fields supplied in the redirection which also
>> could be invalid so this one flag summarizes that instead of
>> overloading one of the fields.
>
> Yes, and you can use lid value as such flag - just simpler.

When GID redirection is specified by client, LID must be 0 so I don't see this.

>> >> - ?? ?? ?? ?? ?? ?? p_mon_node->redir_port[port].redir_lid = 0;
>> >> - ?? ?? ?? ?? ?? ?? p_mon_node->redir_port[port].redir_qp = 0;
>> >> + ?? ?? ?? ?? ?? ?? /* Clear redirection info for this port except orig_lid */
>> >> + ?? ?? ?? ?? ?? ?? orig_lid = p_mon_node->redir_port[port].orig_lid;
>> >> + ?? ?? ?? ?? ?? ?? memset(&p_mon_node->redir_port[port], 0, sizeof(redir_t));
>> >> + ?? ?? ?? ?? ?? ?? p_mon_node->redir_port[port].orig_lid = orig_lid;
>> >
>> > Hmm, why should 'orig_lid' be part of redirection structure and not
>> > placed on original node/port (below I see that it is used in
>> > non-redirected paths)?
>>
>> What are you referring to here ?
>
> Actually I was wrong - I don't where it is used at all.
> The comment about using in non-redirected path was about pkey_ix.
>
>> > I think it would be better to use structures like:
>> >
>> > struct node {
>> > ?? ?? ?? ??....
>> > ?? ?? ?? ??uint16_t lid;
>> > ?? ?? ?? ??uint16_t pkey_ix;
>>
>> Why would lid and pkey_ix be part of node ?
>
> You are using this with perfmgr_send_pc_mad() in main flow.
>
>> It's like this with redir_tbl_size r.t. num_ports and redir_t
>> redir_port[1] instead of your struct port {} ports[0]. Why would what
>> you propose be better ?
>
> My point was different - to separate redirection related data from main
> flow.

I'm still not sure what you mean by this. Encapsulate the redirection
data better so it is obtained by some potentially common routine ?

>> > PerfMgr is always running over discovered fabric so maybe local port
>> > number should be detected later at start of PerfMgr process cycle just
>> > using OpenSM DB.
>>
>> Why is that better than doing this at bind time of PerfMgr ?
>
> At least two reasons: faster and less code.

Are you sure the OpenSM DB accesses will be faster than the vendor calls here ?

Is bind performance sensitive anyhow ? The performance comment is
clearly relevant to the main flow though.

>> >> ?? ?? ?? }
>> >> @@ -511,6 +556,10 @@ __osm_perfmgr_query_counters(cl_map_item_t * const p_map_item, void *context)
>> >> ?? ?? ?? ?? ?? ?? ?? if (!osm_node_get_physp_ptr(node, port))
>> >> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? continue;
>> >>
>> >> + ?? ?? ?? ?? ?? ?? if (mon_node->redir_port[port].redirection &&
>> >> + ?? ?? ?? ?? ?? ?? ?? ?? mon_node->redir_port[port].invalid)
>> >> + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? continue;
>> >> +
>> >
>> > Are two flags really needed? Couldn't this be stripped down?
>>
>> Just invalid should be sufficient. I'll change this in the next version.
>>
>> > Also what about letting "chance" for port to refresh redirection info?
>>
>> What do you mean ?
>
> When port has invalid redirection data, should you care about attempting
> to refresh this?

If the PMA gives bad redirection data (which BTW is noncompliant), it
seems likely to do this again so I'm not sure about the value of this.
Do you think that's a better thing to do ?

>> >> + ?? ?? /* call the transport layer for a list of local port pkeys */
>> >> + ?? ?? status = osm_vendor_get_all_port_attr(pm->subn->p_osm->p_vendor,
>> >> + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? attr_array, &num_ports);
>> >
>> > This heavy stuff is performed per redirection request, but it actually
>> > uses same data (local port's pkey table). This looks very inefficient.
>> > Instead of doing this you can get local port's pkey table only once at
>> > PerfMgr process cycle start and do all checks against this already
>> > initialized table. Of course only in case when redirection is enabled at
>> > all.
>>
>> Redirection does not occur frequently.
>
> How could we know:)

It's the current use case for PerfMgt.

>> Also, the pkey table could
>> change in between
>
> When OpenSM is in master mode it cannot change (PerfMgr is synchronized
> with heavy sweep).
>
> It is possible with standby OpenSM, so what - this single request will
> fail once.

Some recovery for such failure would be needed.

Also, what about not active ?

>> and there's no local event support in OpenSM so I
>> don't see a way around this other than polling.
>
> Polling is not needed there.

As opposed to being event driven based on change; it's timer driven
(e.g. polled) regardless of whether vendor layer or OpenSM DB is
queried.

>> >> + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? OSM_LOG(pm->log, OSM_LOG_ERROR,
>> >> + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? "ERR 4C1B: Index for Pkey 0x%x not found\n",
>> >> + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? cl_ntoh16(cpi->redir_pkey));
>> >> + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? invalid = TRUE;
>> >> ?? ?? ?? ?? ?? ?? ?? }
>> >
>> > All above are not OpenSM errors, but wrong external data. I think it
>> > should be logged as VERBOSE messages.
>>
>> I agree it's wrong external data but it seems serious enough to me to
>> treat as an error.
>
> And some stupid port will be able to put OpenSM in endless error
> printing. I don't think it is a good idea.

It would be a non compliant PMA which I would think we'd want to know
about sooner rather than later.

>> If not, at least INFO rather than VERBOSE so
>> nothing special needs to be done to see these.
>
> IMO VERBOSE level is most appropriate for such things, INFO is something
> else and it is on by default.

Yes, that's why I suggested INFO so nothing special would need to be
done to see that there was a misbehaving PMA.

I think this log level is the tradeoff in making it easier to debug
problems in the field v. spamming the OpenSM log until the node is
found/removed/repaired.

Anyhow, it's clear to me that you won't accept it at even INFO, so
I'll change these to VERBOSE.

>> >> - ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??"redirection requested but disabled\n");
>> >> + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? OSM_LOG(pm->log, OSM_LOG_ERROR, "ERR 4C16: "
>> >> + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? "redirection requested but disabled\n");
>> >> + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? invalid = TRUE;
>> >> + ?? ?? ?? ?? ?? ?? }
>> >
>> > This is not an error.
>>
>> Seems like some sort of configuration error to me if this is disabled
>> at the manager but the PMA wants to use it.
>
> PMA shouldn't dictate here.

PMA does dictate redirection. Manager has no way to shut it off. If
manager turns off it's handling of redirection, then it just doesn't
work (that port is inaccessible by the manager). This argues for the
default to be enabled. The current default is disabled since this code
was deemed experimental.

>> Other local configuration
>> errors are treated as errors.
>
> This is not configuration error - if admin decided to not bother with
> redirection support that is fine.

Not really. See above comment.

>> > BTW, why to bother with verifying redirection info when redirection
>> > support is disabled anyway?
>>
>> I thought it was useful to know the redirection info was invalid
>> rather than getting the disabled notification and then enabling and
>> finding out.
>
> For PMAs debug purposes redirection support should be switched "on"
> obviously.

Why do you say debug purposes ? Isn't it any purpose ? See above.

>> It can easily be moved up earlier in the flow if that's
>> better.
>
> Would be better IMO.

I'll do that in the next version.

-- Hal

>> > In general I would suggest to not mix redirection case with main flow
>> > (by using better data structures). The Redirection is not something
>> > PerfMgr specific and ideally we could have separate redirection handling
>> > module.  I'm not requesting to do this now (in this implementation), but
>> > at least flow separation is very desirable.
>>
>> I've done some more of this in subsequent patches not yet submitted.
>> However, there is no other GS manager (and if it did exist would it
>> use redirection ? there are known cases for PerfMgr currently).
>>
>> In any case, this is "poor man's" redirection support given what is
>> currently available in the OpenFabrics IB stack.
>
> Right. So I'm not requesting "generic redirection module", just to not
> mix the main and redirected flows.
>
> Sasha
>



More information about the general mailing list