[ofa-general] RcvSwRelayErrors
Bernd Schubert
bs at q-leap.de
Thu Mar 20 05:54:53 PDT 2008
On Thursday 20 March 2008 13:27:36 Hal Rosenstock wrote:
> On Thu, 2008-03-20 at 12:30 +0100, Bernd Schubert wrote:
> > Hello,
> >
> > on one of our systems we get a rather huge numbers of RcvSwRelayErrors.
> > All I find about RcvSwRelayErrors is
> >
> > "This counter can increase due to a valid network event"
> >
> > But what might cause?
Ooops. This should have been "But what might cause it?"
>
> Are you running IB multicast (e.g. IPoIB) ? That's the most common
> cause.
IPoIB is up, but so far only used initially by lustre for initial lnet o2ib
setup, but then AFAIK not any more. I think some MPI stacks/applications also
do their intial connection using IPoIB.
But in general, once these connections are established, IPoIB is not much used
anymore.
Thanks,
Bernd
>
> -- Hal
>
> > Thanks in advance for any help,
> > Bernd
> >
> >
> > [...]
> > 11: [RcvSwRelayErrors == 189]
> > 12: [RcvSwRelayErrors == 196]
> > 16: [RcvSwRelayErrors == 34655]
> > Errors for 0x000b8cffff002b33 "MT47396 Infiniscale-III Mellanox
> > Technologies ()"
> > 1: [RcvSwRelayErrors == 190]
> > 2: [RcvSwRelayErrors == 188]
> > 3: [RcvSwRelayErrors == 195]
> > 4: [RcvSwRelayErrors == 207]
> > 5: [RcvSwRelayErrors == 194]
> > 6: [RcvSwRelayErrors == 189]
> > 8: [RcvSwRelayErrors == 198]
> > 9: [RcvSwRelayErrors == 197]
> > 10: [RcvSwRelayErrors == 190]
> > 11: [RcvSwRelayErrors == 198]
> > 12: [RcvSwRelayErrors == 190]
> > 16: [RcvSwRelayErrors == 34711]
> > Errors for 0x000b8cffff002b43 "MT47396 Infiniscale-III Mellanox
> > Technologies ()"
> > 1: [RcvSwRelayErrors == 196]
> > 3: [RcvSwRelayErrors == 242]
> > [...]
--
Bernd Schubert
Q-Leap Networks GmbH
More information about the general
mailing list