[ofa-general] XmtDiscards
Bernd Schubert
bs at q-leap.de
Fri Apr 4 15:12:39 PDT 2008
Hello,
after I upgraded one of our clusters to opensm-3.2.1 it seems to have gotten
much better there, at least no further RcvSwRelayErrors, even when the
cluster is in idle state and so far also no SymbolErrors, which we also have
seens before.
However, after I just started a lustre stress test on 50 clients (to a lustre
storage system with 20 OSS servers and 60 OSTs), ibcheckerrors reports about
9000 XmtDiscards within 30 minutes.
Searching for this error I find "This is a symptom of congestion and may
require tweaking either HOQ or switch lifetime values".
Well, I have to admit I neither know what HOQ is, nor do I know how to tweak
it. I also do not have an idea to set switch lifetime values. I guess this
isn't related to the opensm timeout option, is it?
Hmm, I just found a cisci pdf describing how to set the lifetime on these
switches, but is this also possible on Flextronics switches?
Thanks for any help,
Bernd
--
Bernd Schubert
Q-Leap Networks GmbH
More information about the general
mailing list