[Users] interpreting ibdiagnet output

Ira Weiny weiny2 at llnl.gov
Mon Sep 17 14:45:51 PDT 2012


On Mon, 17 Sep 2012 16:31:12 -0500
Narayan Desai <narayan.desai at gmail.com> wrote:

> On Mon, Sep 17, 2012 at 3:02 PM, Ira Weiny <weiny2 at llnl.gov> wrote:
> > On Mon, 17 Sep 2012 12:48:18 -0500
> > Narayan Desai <narayan.desai at gmail.com> wrote:
> >
> >> Is there a canonical place that describes the errors reflected in
> >> ibdiagnet output, and potential resolutions? I'm trying to fix up a
> >> qdr fabric, and am seeing a combination of:
> >>  - symbol errors (these are pretty clear; i'm assuming that cable
> >> replacement is the solution in a lot of these cases)
> >
> > Fix these first.
> >
> >>  - ipoib speed errors (the group is at 10 gbit, but the network is
> >> capable of 40; is this a cosmetic error, or will it cause actual
> >> performance problems?)
> >
> > This is likely to be due to the above errors causing some ports to be slower (10gbit) and they joined/created the mcast group first.  Thus the whole group is slower than capable.
> >
> > I think these will clean up once you fix the errors above.
> >
> > Not sure how ibdiagnet reports these errors but to check for slow ports you could do the following:
> >
> > iblinkinfo -l | grep -i could
> >
> > All ports which "could be" faster (either link width or speed) will be listed.
> 
> Oddly enough, there aren't any ports listed that could be at a faster
> speed. My next step is to go reseat some cables.

My apologizes I think I was mistaken.

> 
> I found some reference to ipoib at speeds greater than 10 gbits
> requiring some manual setup in opensm; is this still the case?

Could you point me to the reference?  I just want to see what it is referring to.

The only place to preconfigure Mcast groups is in the "partitions.conf" file.  (I know, don't ask.)

I think you may want something like this line.

Default=0x7fff,ipoib,rate=7 : ALL = full;

This should set all the IPoIB Mcast groups to 40 Gb/sec.

> 
> Also, is there a document describing how to configure opensm properly?
> (yes, i realize properly is a loaded term in this case)

Not really other than the man page.

Ira

> thanks again
>  -nld


-- 
Ira Weiny
Member of Technical Staff
Lawrence Livermore National Lab
925-423-8008
weiny2 at llnl.gov



More information about the Users mailing list