[Users] MXM/FCA results

Susan Coulter markus at lanl.gov
Tue May 7 08:52:19 PDT 2013


On May 6, 2013, at 8:57 PM, Russell Dekema wrote:

> Susan,
> 
> Did you have to do anything 'special' (beyond what is in the FCA documentation) to get FCA to work on your cluster? Are you running FCA with Mellanox UFM, or OFA OpenSM? 

OpenSM

> 
> I ask because we are having trouble getting FCA to work on our (UFM) cluster. We are working with Mellanox on it, but I'd be curious to hear more about your environment.

There were a couple problems - the fca verbose mca options did not appear to be functional early on. (8/2012)
I was running OpenMPI with those on in an attempt to get more data on the process.
Those options seemed to introduce problems so I stopped using them.  
That may be fixed now.

The larger problem was that it could not handle MTU mismatches.
We had changed the base MTU on the compute nodes to 4k from 2k via "set_4k_mtu=1" in modprobe , but not on the master - where the SM was running.
FCA could not handle that, so I turned on 4k MTU on the master - that fixed it.
This may also be fixed now.
 
What is the problem you are experiencing?  Is fca_managerd running?

> 
> Cheers,
> Rusty Dekema
> CAEN High Performance Computing
> 
> 
> On Mon, May 6, 2013 at 7:09 PM, Susan Coulter <markus at lanl.gov> wrote:
> 
> Many moons ago there was a brief discussion on this list about MXM/FCA testing.
> I promised to send the results of my testing once I got things working.
> It's been working for quite a while on a ~600 node QDR cluster and the results are pretty remarkable.
> Attached are several graphs of 2 different MPI synthetics I use to test IB performance - ring and scatter.
> 
> Scatter shows pretty good performance gains with smaller message sizes - larger message sizes pretty much stink.
> Ring shows significant performance increases across the board, with a couple odd results that may need more analysis/testing.
> 
> <scatter_inc.png><scatter512.png><scatter1024.png><scatter2048.png><scatter4096.png><ring_inc.png><ring512.png><ring1024.png><ring2048.png><ring4096.png>
> 
> ====================================
> 
> Susan Coulter
> HPC-3 Network/Infrastructure
> 505-667-8425
> Increase the Peace...
> An eye for an eye leaves the whole world blind
> ====================================
> 
> 
> _______________________________________________
> Users mailing list
> Users at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users
> 
> 

====================================

Susan Coulter
HPC-3 Network/Infrastructure
505-667-8425
Increase the Peace...
An eye for an eye leaves the whole world blind
====================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20130507/96a30840/attachment.html>


More information about the Users mailing list