***SPAM*** Re: [ofa-general] ***SPAM*** Re: [PATCHv2] opensm/PerfMgr: Better redirection support

Sasha Khapyorsky sashak at voltaire.com
Wed Apr 15 17:54:22 PDT 2009


On 16:29 Wed 15 Apr     , Hal Rosenstock wrote:
> >>
> >> ??/* Redirection information */
> >> ??typedef struct redir {
> >> + ?? ?? boolean_t redirection;
> >> + ?? ?? boolean_t invalid;
> >
> > Why using lid value != 0 is/was bad for redirection invalidation?
> 
> b/c there are other fields supplied in the redirection which also
> could be invalid so this one flag summarizes that instead of
> overloading one of the fields.

Yes, and you can use lid value as such flag - just simpler.

> >> - ?? ?? ?? ?? ?? ?? p_mon_node->redir_port[port].redir_lid = 0;
> >> - ?? ?? ?? ?? ?? ?? p_mon_node->redir_port[port].redir_qp = 0;
> >> + ?? ?? ?? ?? ?? ?? /* Clear redirection info for this port except orig_lid */
> >> + ?? ?? ?? ?? ?? ?? orig_lid = p_mon_node->redir_port[port].orig_lid;
> >> + ?? ?? ?? ?? ?? ?? memset(&p_mon_node->redir_port[port], 0, sizeof(redir_t));
> >> + ?? ?? ?? ?? ?? ?? p_mon_node->redir_port[port].orig_lid = orig_lid;
> >
> > Hmm, why should 'orig_lid' be part of redirection structure and not
> > placed on original node/port (below I see that it is used in
> > non-redirected paths)?
> 
> What are you referring to here ?

Actually I was wrong - I don't where it is used at all.
The comment about using in non-redirected path was about pkey_ix.

> > I think it would be better to use structures like:
> >
> > struct node {
> > ?? ?? ?? ??....
> > ?? ?? ?? ??uint16_t lid;
> > ?? ?? ?? ??uint16_t pkey_ix;
> 
> Why would lid and pkey_ix be part of node ?

You are using this with perfmgr_send_pc_mad() in main flow.

> It's like this with redir_tbl_size r.t. num_ports and redir_t
> redir_port[1] instead of your struct port {} ports[0]. Why would what
> you propose be better ?

My point was different - to separate redirection related data from main
flow.

> > PerfMgr is always running over discovered fabric so maybe local port
> > number should be detected later at start of PerfMgr process cycle just
> > using OpenSM DB.
> 
> Why is that better than doing this at bind time of PerfMgr ?

At least two reasons: faster and less code.

> >> ?? ?? ?? }
> >> @@ -511,6 +556,10 @@ __osm_perfmgr_query_counters(cl_map_item_t * const p_map_item, void *context)
> >> ?? ?? ?? ?? ?? ?? ?? if (!osm_node_get_physp_ptr(node, port))
> >> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? continue;
> >>
> >> + ?? ?? ?? ?? ?? ?? if (mon_node->redir_port[port].redirection &&
> >> + ?? ?? ?? ?? ?? ?? ?? ?? mon_node->redir_port[port].invalid)
> >> + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? continue;
> >> +
> >
> > Are two flags really needed? Couldn't this be stripped down?
> 
> Just invalid should be sufficient. I'll change this in the next version.
> 
> > Also what about letting "chance" for port to refresh redirection info?
> 
> What do you mean ?

When port has invalid redirection data, should you care about attempting
to refresh this?

> >> + ?? ?? /* call the transport layer for a list of local port pkeys */
> >> + ?? ?? status = osm_vendor_get_all_port_attr(pm->subn->p_osm->p_vendor,
> >> + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? attr_array, &num_ports);
> >
> > This heavy stuff is performed per redirection request, but it actually
> > uses same data (local port's pkey table). This looks very inefficient.
> > Instead of doing this you can get local port's pkey table only once at
> > PerfMgr process cycle start and do all checks against this already
> > initialized table. Of course only in case when redirection is enabled at
> > all.
> 
> Redirection does not occur frequently.

How could we know:)

> Also, the pkey table could
> change in between

When OpenSM is in master mode it cannot change (PerfMgr is synchronized
with heavy sweep).

It is possible with standby OpenSM, so what - this single request will
fail once.

> and there's no local event support in OpenSM so I
> don't see a way around this other than polling.

Polling is not needed there.

> >> + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? OSM_LOG(pm->log, OSM_LOG_ERROR,
> >> + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? "ERR 4C1B: Index for Pkey 0x%x not found\n",
> >> + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? cl_ntoh16(cpi->redir_pkey));
> >> + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? invalid = TRUE;
> >> ?? ?? ?? ?? ?? ?? ?? }
> >
> > All above are not OpenSM errors, but wrong external data. I think it
> > should be logged as VERBOSE messages.
> 
> I agree it's wrong external data but it seems serious enough to me to
> treat as an error.

And some stupid port will be able to put OpenSM in endless error
printing. I don't think it is a good idea.

> If not, at least INFO rather than VERBOSE so
> nothing special needs to be done to see these.

IMO VERBOSE level is most appropriate for such things, INFO is something
else and it is on by default.

> >> - ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??"redirection requested but disabled\n");
> >> + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? OSM_LOG(pm->log, OSM_LOG_ERROR, "ERR 4C16: "
> >> + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? "redirection requested but disabled\n");
> >> + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? invalid = TRUE;
> >> + ?? ?? ?? ?? ?? ?? }
> >
> > This is not an error.
> 
> Seems like some sort of configuration error to me if this is disabled
> at the manager but the PMA wants to use it.

PMA shouldn't dictate here.

> Other local configuration
> errors are treated as errors.

This is not configuration error - if admin decided to not bother with
redirection support that is fine.

> > BTW, why to bother with verifying redirection info when redirection
> > support is disabled anyway?
> 
> I thought it was useful to know the redirection info was invalid
> rather than getting the disabled notification and then enabling and
> finding out.

For PMAs debug purposes redirection support should be switched "on"
obviously.

> It can easily be moved up earlier in the flow if that's
> better.

Would be better IMO.

> > In general I would suggest to not mix redirection case with main flow
> > (by using better data structures). The Redirection is not something
> > PerfMgr specific and ideally we could have separate redirection handling
> > module.  I'm not requesting to do this now (in this implementation), but
> > at least flow separation is very desirable.
> 
> I've done some more of this in subsequent patches not yet submitted.
> However, there is no other GS manager (and if it did exist would it
> use redirection ? there are known cases for PerfMgr currently).
> 
> In any case, this is "poor man's" redirection support given what is
> currently available in the OpenFabrics IB stack.

Right. So I'm not requesting "generic redirection module", just to not
mix the main and redirected flows.

Sasha



More information about the general mailing list