[ofw] OpenSM with HPC

Hal Rosenstock hal.rosenstock at gmail.com
Thu Oct 2 11:37:44 PDT 2008


Hi Anatoly,

On Thu, Oct 2, 2008 at 7:52 AM, Anatoly Greenblatt
<anatolyg at voltaire.com> wrote:
> Hi,
>
>
>
> Our client reported problems running over 192 concurrent jobs with OpenSM.
> The jobs are executed several times. After a while the memory usage of
> OpenSM goes to ~30MB, cpu usage to 100% and eventually the node freezes and
> needs to be reset.
>
>
>
> Configuration:
>
> Winof rev 1596 (~rc1)
>
> ConnectX HCA
>
> Windows 2008 x64 with HPC pack rc2
>
> NetworkDirect is installed
>
> OpenSM is running as a service on the head node.
>
> About a hundred nodes are used (maybe more, I don't have exact number yet)
>
>
>
> Has anyone any thoughts about this?

Voltaire has the Linux (and core) OpenSM maintainer. What does he say
about this ?

Problems like this were seen with older OpenSMs which is what the
Windows version is. IMHO OpenSM in Windows needs to be updated to the
latest and greatest. Until that occurs and it seems no one has the
cycles for this, I think we have and will continue to chase our tails
on many such issues. Any chance Voltaire has an interest in doing this
since you have customers who appear to be relying on this ? It would
do the community a large service.

-- Hal

> Thanks,
>
> Anatoly.
>
>
>
> _______________________________________________
> ofw mailing list
> ofw at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
>



More information about the ofw mailing list