[Users] interpreting ibdiagnet output
Ira Weiny
weiny2 at llnl.gov
Mon Sep 17 14:45:51 PDT 2012
On Mon, 17 Sep 2012 16:31:12 -0500
Narayan Desai <narayan.desai at gmail.com> wrote:
> On Mon, Sep 17, 2012 at 3:02 PM, Ira Weiny <weiny2 at llnl.gov> wrote:
> > On Mon, 17 Sep 2012 12:48:18 -0500
> > Narayan Desai <narayan.desai at gmail.com> wrote:
> >
> >> Is there a canonical place that describes the errors reflected in
> >> ibdiagnet output, and potential resolutions? I'm trying to fix up a
> >> qdr fabric, and am seeing a combination of:
> >> - symbol errors (these are pretty clear; i'm assuming that cable
> >> replacement is the solution in a lot of these cases)
> >
> > Fix these first.
> >
> >> - ipoib speed errors (the group is at 10 gbit, but the network is
> >> capable of 40; is this a cosmetic error, or will it cause actual
> >> performance problems?)
> >
> > This is likely to be due to the above errors causing some ports to be slower (10gbit) and they joined/created the mcast group first. Thus the whole group is slower than capable.
> >
> > I think these will clean up once you fix the errors above.
> >
> > Not sure how ibdiagnet reports these errors but to check for slow ports you could do the following:
> >
> > iblinkinfo -l | grep -i could
> >
> > All ports which "could be" faster (either link width or speed) will be listed.
>
> Oddly enough, there aren't any ports listed that could be at a faster
> speed. My next step is to go reseat some cables.
My apologizes I think I was mistaken.
>
> I found some reference to ipoib at speeds greater than 10 gbits
> requiring some manual setup in opensm; is this still the case?
Could you point me to the reference? I just want to see what it is referring to.
The only place to preconfigure Mcast groups is in the "partitions.conf" file. (I know, don't ask.)
I think you may want something like this line.
Default=0x7fff,ipoib,rate=7 : ALL = full;
This should set all the IPoIB Mcast groups to 40 Gb/sec.
>
> Also, is there a document describing how to configure opensm properly?
> (yes, i realize properly is a loaded term in this case)
Not really other than the man page.
Ira
> thanks again
> -nld
--
Ira Weiny
Member of Technical Staff
Lawrence Livermore National Lab
925-423-8008
weiny2 at llnl.gov
More information about the Users
mailing list