[Users] OpenSM error message rosetta stone?

Ira Weiny weiny2 at llnl.gov
Tue Feb 19 14:17:25 PST 2013


On Tue, 19 Feb 2013 14:38:47 -0600
Narayan Desai <narayan.desai at gmail.com> wrote:

> On Tue, Feb 19, 2013 at 1:03 PM, Ira Weiny <weiny2 at llnl.gov> wrote:
> 
> >> It looks like some lines are being mixed; is this just a lack of a
> >> newline, or are the messages interspersed?
> >
> > Yes there is a bug here.  I submitted a patch but it was rejected because the newline was added as part of another patch.  So, I believe this is fixed in 3.3.16.
> 
> This is just cosmetic, right?

yes.
Ira

> 
> >>
> >> Does the initial path information identify the remote node having
> >> troubles? How can I turn that into usable coordinates?
> >
> > The DR path in this case is the node which the SM _can_ talk to (0,1,19,13 guid 0x0002c902004158b0).  The remote node which is not responding is on port 6 of that node.  Whatever is connected to port 6 is the problem node.
> >
> > The easiest way to trace this using the diags would be:
> >
> > iblinkinfo -D 0,1,19,13
> > or
> > iblinkinfo -G 0x0002c902004158b0
> >
> > It too will fail to query port 6 but it should give you a better idea of where in the fabric you are by looking at the other nodes connected to other ports...
> 
> Thanks.
>  -nld


-- 
Ira Weiny
Member of Technical Staff
Lawrence Livermore National Lab
925-423-8008
weiny2 at llnl.gov



More information about the Users mailing list