[ewg] Re: [ofa-general] [PATCH] infiniband-diags/ibcheckerrors: for CAs query only single ports

Sasha Khapyorsky sashak at voltaire.com
Tue Dec 11 08:46:57 PST 2007


On 07:25 Tue 11 Dec     , Hal Rosenstock wrote:
> On Tue, 2007-12-11 at 15:27 +0000, Sasha Khapyorsky wrote:
> > On 06:57 Tue 11 Dec     , Hal Rosenstock wrote:
> > > On Tue, 2007-12-11 at 13:46 +0000, Sasha Khapyorsky wrote:
> > > > For CAs query performance counters only for single ports by lid and port
> > > > number, and not whole node with 'all ports' option.
> > > 
> > > Should the description also reference the bug # ?
> > 
> > I will add.
> > 
> > > Will a similar thing be done to the other diag scripts which have this
> > > same issue (but haven't been reported yet) ?
> > 
> > It is reasonable. I will try to check other scripts too.
> > 
> > > Would it be better to fix this in the underlying tool used (perfquery)
> > > and in that way address it for all the diag scripts ?
> > 
> > I think perfquery could/should be improved as well, but it is not the
> > same issue. 
> 
> Why not ?
> 
> If perfquery paved over the lack of support for all ports, then all the
> scripts would be fine as is, right ?

Yes, but I think that it more accurate to query CA ports and not just
nodes (even if 'all ports' option is supported).

> 
> > I think that in general it is more accurate when whole
> > fabric is checked to query endport's by port and not by node - multiport
> > CA can have disconnected ports and/or ports which connected to another
> > subnet - in this way its counters are irrelevant to the check. Right?
> 
> Yes, but doing it on a node basis cuts down on the number of queries.

True, but doing right things is more important here than number of
queries IMO (BTW in practice the difference in number of queries is not
so significant - it is in percents, not in times).

> One can always go back and dive down to the port level after seeing
> which nodes are of interest.

The problem is that one can get invalid error report with such script -
for example when CA has "bad" port which is connected to another subnet.

Sasha



More information about the ewg mailing list