[openib-general] got ipoib up once but not twice :-)

Hal Rosenstock halr at voltaire.com
Mon Jan 17 07:27:11 PST 2005


On Sat, 2005-01-15 at 18:18, Matt Leininger wrote:
> On Sat, 2005-01-15 at 07:27 -0500, Hal Rosenstock wrote:
> > On Fri, 2005-01-14 at 23:10, Ronald G. Minnich wrote:
> > > Hmm, it's back. I guess I was not patient enough. Not sure when it all got
> > > back. I will have to time it next time, I assume it won't take 6 hours 
> > > each time :-)
> > > 
> > > I'm working on making this 256-node cluster work over infiniband only, 
> > > same as our myrinet clusters which are myrinet-only.
> > 
> > How many 96 port switches ? I'd be curious how long it does take to
> > initialize this (as I do not have access to a large cluster). Also,
> > right now I'm pretty sure things are being done without pipelining on so
> > it is likely slower. More on this later.
> > 
>  
>   Ron has 9 96 port switches, 3 in the spine and 6 leaf switches all
> based on the InfiniScale II switch ASIC, to make a 288 port fabric.
> It's not quite a true fat-tree network since there are no spine bypass
> cards for the older 96 port switch (needed on the leaf switches).  This
> is an interesting test case because Ron's network has a total of 612
> switch chips.  A 1152 port fat-tree fabric based on InfiniScale III
> would have 240 switch chips.  

That's a big cluster :-) There are several things that can be done which
should improve this but I would be curious as to how long the
initialization really does take.

1. Currently OpenSM is compiled with debug and no optimization. This
should be changed to at least -O2 (and perhaps -O4) but I would start
with -O2. This results in a 2x speedup for some code paths.

2. OpenSM supports a pipelining mode for SMPs. The default is 1
outstanding SMP. -maxsmps <#> indicates the number of outstanding SMPs
allowed and should speed up the initialization. Useful values of this
are 16 and 32.

Beyond this, there may be some issue with a link which is causing
timeout and retries to kick in. The OSM log should have some messages in
there indicating this.

I will add this info to the management README.

-- Hal




More information about the general mailing list