[ofa-general] Running OpenSM on large clusters
    Edward Mascarenhas 
    eddiem at sgi.com
       
    Tue Oct 16 16:35:38 PDT 2007
    
    
  
Has anyone seen issues with running OpenSM on large (1500+ nodes) 
clusters?
We are seeing 1000s of the following message in the system log
__osm_sa_mad_ctrl_process: Dropping MAD since the dispatcher is 
already overloaded with 6736 messages and queue time of:10006[msec]
It seems like a huge number of datagrams are being generated resulting 
in increased time to bring up the fabric. 
Is there a threshold of cluster size beyond which we are likely to see 
these messages.
How many MADs are generated during bring up?
What is the largest cluster size for which OpenSM has been tried by 
others?
Thanks,
Edward
    
    
More information about the general
mailing list