[ofa-general] XmtDiscards

Hal Rosenstock hrosenstock at xsigo.com
Mon Apr 7 06:35:10 PDT 2008


Hi Bernd,

On Sun, 2008-04-06 at 18:05 +0200, Bernd Schubert wrote:
> Hello Hal,
> 
> On Sat, Apr 05, 2008 at 06:19:43AM -0700, Hal Rosenstock wrote:
> > Hi Bernd,
> > 
> > On Sat, 2008-04-05 at 00:12 +0200, Bernd Schubert wrote:
> > > Hello,
> > > 
> > > after I upgraded one of our clusters to opensm-3.2.1 it seems to have gotten 
> > > much better there, at least no further RcvSwRelayErrors, even when the 
> > > cluster is in idle state and so far also no SymbolErrors, which we also have 
> > > seens before.
> > > 
> > > However, after I just started a lustre stress test on 50 clients (to a lustre 
> > > storage system with 20 OSS servers and 60 OSTs), ibcheckerrors reports about 
> > > 9000 XmtDiscards within 30 minutes.
> > > 
> > > Searching for this error I find "This is a symptom of congestion and may 
> > > require tweaking either HOQ or switch lifetime values". 
> > > Well, I have to admit I neither know what HOQ is, nor do I know how to tweak 
> > > it. I also do not have an idea to set switch lifetime values.  I guess this 
> > > isn't related to the opensm timeout option, is it?
> > > 
> > > Hmm, I just found a cisci pdf describing how to set the lifetime on these 
> > > switches, but is this also possible on Flextronics switches?
> > 
> > What routing algorithm are you using ? Rather than play with those
> > switch values, if you are not using up/down, could you try that to see
> > if it helps with the congestion you are seeing ?
> 
> I now configured up/down, but still got XmtDiscards, though, only on one port.
> 
> Error check on lid 205 (SW_pfs1_leaf2) port all:  FAILED
> #warn: counter XmtDiscards = 6213       (threshold 100) lid 205 port 1
> Error check on lid 205 (SW_pfs1_leaf2) port 1:  FAILED
> #warn: counter RcvSwRelayErrors = 1431  (threshold 100) lid 205 port 13
> Error check on lid 205 (SW_pfs1_leaf2) port 13:  FAILED

Are you running IPoIB ? If so, SwRelayErrors are not necessarily
indicative of a "real" issue due to the fact that multicasts reflected
on the same port are mistakenly counted.

> I'm also not sure if up/down is the optimal algorithm for a fabric with only 
> two switches.
> 
> Since describing the connections in words is a bit difficult, I just upload
> a drawing here:
> 
> http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/ib/Interswitch-cabling.pdf
> 
> The root-guid for the up/down algorithm is leaf-5 of of the small switch. But
> I'm still not sure about up/down at all. Doesn't one need for up/down at least
> 3 switches? Something like this ascii graphic below?
> 
> 
>        root-switch
>      /            \
>     /              \
>  Sw-1 ------------ Sw-2

Doesn't your chassis switch have many switches in it ? You did say it
was 144 ports so it's made up of a number of switches.

You may need to choose a "better" root than up/down automatically
determines.

-- Hal

> Thanks for your help,
> Bernd
> 
> 
> PS: These RcvSwRelayErrors are also back again. I think this occur on some 
> operations of Lustre. Even if these RcvSwRelayErrors are not critical, they 
> are still a bit annoying, since they make it hard to find other errors in 
> the output ob ibcheckerrors. 
> If we can really ignore these errors, I will write a patch to not display these 
> by default.
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general




More information about the general mailing list