[ofa-general] RcvSwRelayErrors
Hal Rosenstock
hrosenstock at xsigo.com
Thu Mar 20 07:12:03 PDT 2008
On Thu, 2008-03-20 at 13:54 +0100, Bernd Schubert wrote:
> On Thursday 20 March 2008 13:27:36 Hal Rosenstock wrote:
> > On Thu, 2008-03-20 at 12:30 +0100, Bernd Schubert wrote:
> > > Hello,
> > >
> > > on one of our systems we get a rather huge numbers of RcvSwRelayErrors.
> > > All I find about RcvSwRelayErrors is
> > >
> > > "This counter can increase due to a valid network event"
> > >
> > > But what might cause?
>
> Ooops. This should have been "But what might cause it?"
>
> >
> > Are you running IB multicast (e.g. IPoIB) ? That's the most common
> > cause.
>
> IPoIB is up, but so far only used initially by lustre for initial lnet o2ib
> setup, but then AFAIK not any more. I think some MPI stacks/applications also
> do their intial connection using IPoIB.
>
> But in general, once these connections are established, IPoIB is not much used
> anymore.
The causes are:
1. DLID mapping
2. VL mapping
3. looping (out port = in port)
Is your subnet unstable in some way ? Are you using QoS ?
-- Hal
>
> Thanks,
> Bernd
>
>
> >
> > -- Hal
> >
> > > Thanks in advance for any help,
> > > Bernd
> > >
> > >
> > > [...]
> > > 11: [RcvSwRelayErrors == 189]
> > > 12: [RcvSwRelayErrors == 196]
> > > 16: [RcvSwRelayErrors == 34655]
> > > Errors for 0x000b8cffff002b33 "MT47396 Infiniscale-III Mellanox
> > > Technologies ()"
> > > 1: [RcvSwRelayErrors == 190]
> > > 2: [RcvSwRelayErrors == 188]
> > > 3: [RcvSwRelayErrors == 195]
> > > 4: [RcvSwRelayErrors == 207]
> > > 5: [RcvSwRelayErrors == 194]
> > > 6: [RcvSwRelayErrors == 189]
> > > 8: [RcvSwRelayErrors == 198]
> > > 9: [RcvSwRelayErrors == 197]
> > > 10: [RcvSwRelayErrors == 190]
> > > 11: [RcvSwRelayErrors == 198]
> > > 12: [RcvSwRelayErrors == 190]
> > > 16: [RcvSwRelayErrors == 34711]
> > > Errors for 0x000b8cffff002b43 "MT47396 Infiniscale-III Mellanox
> > > Technologies ()"
> > > 1: [RcvSwRelayErrors == 196]
> > > 3: [RcvSwRelayErrors == 242]
> > > [...]
>
>
>
More information about the general
mailing list