[ofw] OpenSM with HPC

Anatoly Greenblatt anatolyg at voltaire.com
Sun Oct 5 00:26:18 PDT 2008


Hi,

Information I have so far:

32 Computing nodes, Quadcore Dualsocket
1 Head node, Quadcore Dualsocket 64GB RAM
1 Control node, Dualcore Dualsocket

3 edge switches
2 core switches

OpenSM running either on Head node or control node, in both cases the
system freezes when running over 192 concurrent jobs. 

They need to run 256 MPI concurrent jobs continuously.

Regards,
Anatoly.



-----Original Message-----
From: Yevgeny Kliteynik [mailto:kliteyn at dev.mellanox.co.il] 
Sent: Thursday, October 02, 2008 22:54
To: Anatoly Greenblatt
Cc: ofw at lists.openfabrics.org
Subject: Re: [ofw] OpenSM with HPC

Hi Anatoly,

I need more details:

Anatoly Greenblatt wrote:
> Hi,
> 
> Our client reported problems running over 192 concurrent jobs with 
> OpenSM.

What kind of cluster does your client have?
How many hosts? How many switches?

What do these jobs do?
Are these MPI jobs? Do they use/create multicast groups? Something else?
How many processes each job has?

> The jobs are executed several times. After a while the memory 
> usage of OpenSM goes to ~30MB, cpu usage to 100% and eventually the
node 
> freezes and needs to be reset.

Is the problem reproducible?
Can you send me SM log?

-- Yevgeny

>  
> 
> Configuration:
> 
> Winof rev 1596 (~rc1)
> 
> ConnectX HCA
> 
> Windows 2008 x64 with HPC pack rc2
> 
> NetworkDirect is installed
> 
> OpenSM is running as a service on the head node.
> 
> About a hundred nodes are used (maybe more, I don't have exact number
yet)
> 
>  
> 
> Has anyone any thoughts about this?
> 
>  
> 
> Thanks,
> 
> Anatoly.
> 
>  
> 
> 
>
------------------------------------------------------------------------
> 
> _______________________________________________
> ofw mailing list
> ofw at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw

-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenSM-WinIB1.4-log.zip
Type: application/x-zip-compressed
Size: 5500453 bytes
Desc: OpenSM-WinIB1.4-log.zip
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20081005/7af7df44/attachment.bin>


More information about the ofw mailing list