[Users] interpreting ibdiagnet output

Narayan Desai narayan.desai at gmail.com
Mon Sep 17 14:31:12 PDT 2012


On Mon, Sep 17, 2012 at 3:02 PM, Ira Weiny <weiny2 at llnl.gov> wrote:
> On Mon, 17 Sep 2012 12:48:18 -0500
> Narayan Desai <narayan.desai at gmail.com> wrote:
>
>> Is there a canonical place that describes the errors reflected in
>> ibdiagnet output, and potential resolutions? I'm trying to fix up a
>> qdr fabric, and am seeing a combination of:
>>  - symbol errors (these are pretty clear; i'm assuming that cable
>> replacement is the solution in a lot of these cases)
>
> Fix these first.
>
>>  - ipoib speed errors (the group is at 10 gbit, but the network is
>> capable of 40; is this a cosmetic error, or will it cause actual
>> performance problems?)
>
> This is likely to be due to the above errors causing some ports to be slower (10gbit) and they joined/created the mcast group first.  Thus the whole group is slower than capable.
>
> I think these will clean up once you fix the errors above.
>
> Not sure how ibdiagnet reports these errors but to check for slow ports you could do the following:
>
> iblinkinfo -l | grep -i could
>
> All ports which "could be" faster (either link width or speed) will be listed.

Oddly enough, there aren't any ports listed that could be at a faster
speed. My next step is to go reseat some cables.

I found some reference to ipoib at speeds greater than 10 gbits
requiring some manual setup in opensm; is this still the case?

Also, is there a document describing how to configure opensm properly?
(yes, i realize properly is a loaded term in this case)
thanks again
 -nld



More information about the Users mailing list