[openib-general] Scalable Monitoring - RFC

Bernard King-Smith wombat2 at us.ibm.com
Mon Nov 20 13:14:53 PST 2006


> ----- Message from "Eitan Zahavi" <eitan at mellanox.co.il> on Mon, 20 
> Nov 2006 14:24:36 +0200 -----
> 
> To:
> 
> openib-general at openib.org
> 
> Subject:
> 
> [openib-general] Scalable Monitoring - RFC
> 
> Hi All,
> Following the path forward requirements review by Matt L. in the 
> last OFA Dev Summit I have 
> started thinking what would make a monitoring system scale to tens 
> of thousand of node.
> This RFC provides both what I propose as the requirements list as 
> well as a draft implementation proposal - just to start the discussion.
> I apologize for the long mail but I think this issue deserves a 
> careful design (been there done that?)
> Scalable fabric monitoring requirements:
> * scale up to 48k nodes 
> *  16 ports which gets to about 1,000,000 ports.
>     (16 ports per device is average for 32 ports for switch and 1 for 
HCA)

What is the problem you are trying to address? 48K nodes or a single 
fabric of 1,000,000 endpoints? With the number of cores per node going up, 
you are looking at a multiple petaflop machine with this many nodes. When 
do you expect this to happen, 5-10 years from now? Most very large systems 
generally limit fabric port limits to the number of nodes not the number 
of endpoints. A single fabric runs into the problem of too many stages in 
the Fat Tree and single point of failure. Even if each nodes has multiple 
cores, each node can have multiple IB ports each connecting to different 
planes of IB fabric. This means that you only need to address a 
configuration of 48K ports, and for greater bandwidth use multiple 
parallel IB fabrics.

You get higher reliability with multiple planes of IB fabric because the 
failure in point in the fabric doesn't take the entire network down. 
Handling each plane of the fabric as a separate network cuts down on the 
number of elements that each fabric manager has to track. You can always 
aggregate summary information across multiple planes of fabric after 
collecting from the individual fabric managers.

> * provide alerts for ports crossing some rate of change
> * support profiling of data flow through the fabric
> * be able to handle changes in topology due to MTBF.
> Basic design considerations:

 [SNIP]

> Eitan Zahavi
> Senior Engineering Director, Software Architect
> Mellanox Technologies LTD
> Tel:+972-4-9097208
> Fax:+972-4-9593245
> P.O. Box 586 Yokneam 20692 ISRAEL
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general


Bernie King-Smith 
IBM Corporation
Server Group
Cluster System Performance 
wombat2 at us.ibm.com    (845)433-8483
Tie. 293-8483 or wombat2 on NOTES 

"We are not responsible for the world we are born into, only for the world 
we leave when we die.
So we have to accept what has gone before us and work to change the only 
thing we can,
-- The Future." William Shatner
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20061120/2e284cfc/attachment.html>


More information about the general mailing list