[ofa-general] Running OpenSM on large clusters

Edward Mascarenhas eddiem at sgi.com
Wed Oct 17 16:24:40 PDT 2007


On Wednesday 17 October 2007 04:30:49 am Sasha Khapyorsky wrote:
> On 16:35 Tue 16 Oct     , Edward Mascarenhas wrote:
> > Has anyone seen issues with running OpenSM on large (1500+ nodes)
> > clusters?
> >
> > We are seeing 1000s of the following message in the system log
> >
> > __osm_sa_mad_ctrl_process: Dropping MAD since the dispatcher is
> > already overloaded with 6736 messages and queue time
> > of:10006[msec]
>
> I guess you see this during fabric bringup when SA processor is not
> available yet. Which version of OpenSM you are using - we did some
> improvements in this area in recent versions (partially in
> OFED-1.2)?
>
Yes during fabric bringup.

We are using OFED 1.2.

Edward




More information about the general mailing list