[ewg] Re: [ofa-general] [PATCH] infiniband-diags/ibcheckerrors: for CAs query only single ports

Hal Rosenstock hrosenstock at xsigo.com
Tue Dec 11 10:34:37 PST 2007


On Tue, 2007-12-11 at 16:46 +0000, Sasha Khapyorsky wrote:
> On 07:25 Tue 11 Dec     , Hal Rosenstock wrote:
> > On Tue, 2007-12-11 at 15:27 +0000, Sasha Khapyorsky wrote:
> > > On 06:57 Tue 11 Dec     , Hal Rosenstock wrote:
> > > > On Tue, 2007-12-11 at 13:46 +0000, Sasha Khapyorsky wrote:
> > > > > For CAs query performance counters only for single ports by lid and port
> > > > > number, and not whole node with 'all ports' option.
> > > > 
> > > > Should the description also reference the bug # ?
> > > 
> > > I will add.
> > > 
> > > > Will a similar thing be done to the other diag scripts which have this
> > > > same issue (but haven't been reported yet) ?
> > > 
> > > It is reasonable. I will try to check other scripts too.
> > > 
> > > > Would it be better to fix this in the underlying tool used (perfquery)
> > > > and in that way address it for all the diag scripts ?
> > > 
> > > I think perfquery could/should be improved as well, but it is not the
> > > same issue. 
> > 
> > Why not ?
> > 
> > If perfquery paved over the lack of support for all ports, then all the
> > scripts would be fine as is, right ?
> 
> Yes, but I think that it more accurate to query CA ports and not just
> nodes (even if 'all ports' option is supported).

Router ports need same handling as CA ports.
 
> > > I think that in general it is more accurate when whole
> > > fabric is checked to query endport's by port and not by node - multiport
> > > CA can have disconnected ports and/or ports which connected to another
> > > subnet - in this way its counters are irrelevant to the check. Right?
> > 
> > Yes, but doing it on a node basis cuts down on the number of queries.
> 
> True, but doing right things is more important here than number of
> queries IMO (BTW in practice the difference in number of queries is not
> so significant - it is in percents, not in times).

As long as switch ports aren't individually queried.

> > One can always go back and dive down to the port level after seeing
> > which nodes are of interest.
> 
> The problem is that one can get invalid error report with such script -
> for example when CA has "bad" port which is connected to another subnet.

It is a bad port; just not in this particular subnet.

-- Hal

> Sasha
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



More information about the ewg mailing list