[ofa-general] RcvSwRelayErrors

Bernd Schubert bs at q-leap.de
Thu Mar 20 05:54:53 PDT 2008


On Thursday 20 March 2008 13:27:36 Hal Rosenstock wrote:
> On Thu, 2008-03-20 at 12:30 +0100, Bernd Schubert wrote:
> > Hello,
> >
> > on one of our systems we get a rather huge numbers of RcvSwRelayErrors.
> > All I find about RcvSwRelayErrors is
> >
> > "This counter can increase due to a valid network event"
> >
> > But what might cause?

Ooops. This should have been "But what might cause it?"

>
> Are you running IB multicast (e.g. IPoIB) ? That's the most common
> cause.

IPoIB is up, but so far only used initially by lustre for initial lnet o2ib 
setup, but then AFAIK not any more. I think some MPI stacks/applications also 
do their intial connection using IPoIB.

But in general, once these connections are established, IPoIB is not much used 
anymore.

Thanks,
Bernd


>
> -- Hal
>
> > Thanks in advance for any help,
> > Bernd
> >
> >
> > [...]
> >   11: [RcvSwRelayErrors == 189]
> >    12: [RcvSwRelayErrors == 196]
> >    16: [RcvSwRelayErrors == 34655]
> > Errors for 0x000b8cffff002b33 "MT47396 Infiniscale-III Mellanox
> > Technologies ()"
> >    1: [RcvSwRelayErrors == 190]
> >    2: [RcvSwRelayErrors == 188]
> >    3: [RcvSwRelayErrors == 195]
> >    4: [RcvSwRelayErrors == 207]
> >    5: [RcvSwRelayErrors == 194]
> >    6: [RcvSwRelayErrors == 189]
> >    8: [RcvSwRelayErrors == 198]
> >    9: [RcvSwRelayErrors == 197]
> >    10: [RcvSwRelayErrors == 190]
> >    11: [RcvSwRelayErrors == 198]
> >    12: [RcvSwRelayErrors == 190]
> >    16: [RcvSwRelayErrors == 34711]
> > Errors for 0x000b8cffff002b43 "MT47396 Infiniscale-III Mellanox
> > Technologies ()"
> >    1: [RcvSwRelayErrors == 196]
> >    3: [RcvSwRelayErrors == 242]
> > [...]



-- 
Bernd Schubert
Q-Leap Networks GmbH



More information about the general mailing list