[ofw] OpenSM with HPC
Yevgeny Kliteynik
kliteyn at dev.mellanox.co.il
Thu Oct 2 12:54:23 PDT 2008
Hi Anatoly,
I need more details:
Anatoly Greenblatt wrote:
> Hi,
>
> Our client reported problems running over 192 concurrent jobs with
> OpenSM.
What kind of cluster does your client have?
How many hosts? How many switches?
What do these jobs do?
Are these MPI jobs? Do they use/create multicast groups? Something else?
How many processes each job has?
> The jobs are executed several times. After a while the memory
> usage of OpenSM goes to ~30MB, cpu usage to 100% and eventually the node
> freezes and needs to be reset.
Is the problem reproducible?
Can you send me SM log?
-- Yevgeny
>
>
> Configuration:
>
> Winof rev 1596 (~rc1)
>
> ConnectX HCA
>
> Windows 2008 x64 with HPC pack rc2
>
> NetworkDirect is installed
>
> OpenSM is running as a service on the head node.
>
> About a hundred nodes are used (maybe more, I don’t have exact number yet)
>
>
>
> Has anyone any thoughts about this?
>
>
>
> Thanks,
>
> Anatoly.
>
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> ofw mailing list
> ofw at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
More information about the ofw
mailing list