[Users] interpreting ibdiagnet output

Ira Weiny weiny2 at llnl.gov
Mon Sep 17 13:02:56 PDT 2012


On Mon, 17 Sep 2012 12:48:18 -0500
Narayan Desai <narayan.desai at gmail.com> wrote:

> Is there a canonical place that describes the errors reflected in
> ibdiagnet output, and potential resolutions? I'm trying to fix up a
> qdr fabric, and am seeing a combination of:
>  - symbol errors (these are pretty clear; i'm assuming that cable
> replacement is the solution in a lot of these cases)

Fix these first.

>  - ipoib speed errors (the group is at 10 gbit, but the network is
> capable of 40; is this a cosmetic error, or will it cause actual
> performance problems?)

This is likely to be due to the above errors causing some ports to be slower (10gbit) and they joined/created the mcast group first.  Thus the whole group is slower than capable.

I think these will clean up once you fix the errors above.

Not sure how ibdiagnet reports these errors but to check for slow ports you could do the following:

iblinkinfo -l | grep -i could

All ports which "could be" faster (either link width or speed) will be listed.

Ira

>  - pages of multicast errors (not really sure what to do here)
>  - nagging issues with arping (which i suspect might have to do with
> the previous multicast errors)
> 
> Various vendor docs describe how to run ibdiagnet, but don't do any
> more than reproduce some output. I'm happy to RTFM, but a fair big of
> googling has not revealed with FM to R. ;)
>  -nld
> _______________________________________________
> Users mailing list
> Users at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users


-- 
Ira Weiny
Member of Technical Staff
Lawrence Livermore National Lab
925-423-8008
weiny2 at llnl.gov



More information about the Users mailing list