[ofa-general] Running OpenSM on large clusters
Edward Mascarenhas
eddiem at sgi.com
Tue Oct 16 16:35:38 PDT 2007
Has anyone seen issues with running OpenSM on large (1500+ nodes)
clusters?
We are seeing 1000s of the following message in the system log
__osm_sa_mad_ctrl_process: Dropping MAD since the dispatcher is
already overloaded with 6736 messages and queue time of:10006[msec]
It seems like a huge number of datagrams are being generated resulting
in increased time to bring up the fabric.
Is there a threshold of cluster size beyond which we are likely to see
these messages.
How many MADs are generated during bring up?
What is the largest cluster size for which OpenSM has been tried by
others?
Thanks,
Edward
More information about the general
mailing list