<br><tt><font size=2>> ----- Message from "Eitan Zahavi" <eitan@mellanox.co.il>

on Mon, 20 <br>

> Nov 2006 14:24:36 +0200 -----</font></tt>

<br><tt><font size=2>> <br>

> To:</font></tt>

<br><tt><font size=2>> <br>

> openib-general@openib.org</font></tt>

<br><tt><font size=2>> <br>

> Subject:</font></tt>

<br><tt><font size=2>> <br>

> [openib-general] Scalable Monitoring - RFC</font></tt>

<br><tt><font size=2>> <br>

> Hi All,</font></tt>

<br><tt><font size=2>> Following the path forward requirements review

by Matt L. in the <br>

> last OFA Dev Summit I have </font></tt>

<br><tt><font size=2>> started thinking what would make a monitoring

system scale to tens <br>

> of thousand of node.</font></tt>

<br><tt><font size=2>> This RFC provides both what I propose as the

requirements list as <br>

> well as a draft implementation proposal - just to start the discussion.</font></tt>

<br><tt><font size=2>> I apologize for the long mail but I think this

issue deserves a <br>

> careful design (been there done that…)</font></tt>

<br><tt><font size=2>> Scalable fabric monitoring requirements:</font></tt>

<br><tt><font size=2>> * scale up to 48k nodes </font></tt>

<br><tt><font size=2>> *  16 ports which gets to about 1,000,000

ports.</font></tt>

<br><tt><font size=2>>     (16 ports per device is average

for 32 ports for switch and 1 for HCA)</font></tt>

<br>

<br><font size=2 face="sans-serif">What is the problem you are trying to

address? 48K nodes or a single fabric of 1,000,000 endpoints? With the

number of cores per node going up, you are looking at a multiple petaflop

machine with this many nodes. When do you expect this to happen, 5-10 years

from now? Most very large systems generally limit fabric port limits to

the number of nodes not the number of endpoints. A single fabric runs into

the problem of too many stages in the Fat Tree and single point of failure.

Even if each nodes has multiple cores, each node can have multiple IB ports

each connecting to different planes of IB fabric. This means that you only

need to address a configuration of 48K ports, and for greater bandwidth

use multiple parallel IB fabrics.</font>

<br>

<br><font size=2 face="sans-serif">You get higher reliability with multiple

planes of IB fabric because the failure in point in the fabric doesn't

take the entire network down. Handling each plane of the fabric as a separate

network cuts down on the number of elements that each fabric manager has

to track. You can always aggregate summary information across multiple

planes of fabric after collecting from the individual fabric managers.</font>

<br>

<br><tt><font size=2>> * provide alerts for ports crossing some rate

of change</font></tt>

<br><tt><font size=2>> * support profiling of data flow through the

fabric</font></tt>

<br><tt><font size=2>> * be able to handle changes in topology due to

MTBF.</font></tt>

<br><tt><font size=2>> Basic design considerations:</font></tt>

<br>

<br><tt><font size=2> [SNIP]</font></tt>

<br>

<br><tt><font size=2>> Eitan Zahavi</font></tt>

<br><tt><font size=2>> Senior Engineering Director, Software Architect</font></tt>

<br><tt><font size=2>> Mellanox Technologies LTD</font></tt>

<br><tt><font size=2>> Tel:+972-4-9097208<br>

> Fax:+972-4-9593245</font></tt>

<br><tt><font size=2>> P.O. Box 586 Yokneam 20692 ISRAEL<br>

> _______________________________________________<br>

> openib-general mailing list<br>

> openib-general@openib.org<br>

> http://openib.org/mailman/listinfo/openib-general<br>

</font></tt>

<br><font size=2 face="sans-serif"><br>

Bernie King-Smith  <br>

IBM Corporation<br>

Server Group<br>

Cluster System Performance  <br>

wombat2@us.ibm.com    (845)433-8483<br>

Tie. 293-8483 or wombat2 on NOTES <br>

<br>

"We are not responsible for the world we are born into, only for the

world we leave when we die.<br>

So we have to accept what has gone before us and work to change the only

thing we can,<br>

-- The Future." William Shatner</font>

<br>