[Users] FCA Testing

Florent Parent florent.parent at calculquebec.ca
Tue Aug 21 08:10:08 PDT 2012


Not tested here. I actually found this existed while at the OFED user day.

I'm glad you are trying this out. This would be interesting to us too. :)

We have a user code here that performs (communication wise) about 10
fold better on a Qlogic based cluster over our Mellanox IB system. We
have more tests to do, but we boiled this down to the connection rate
(user job does MPI send/recv across multiple ranks) being better over
the Qlogic system.

The Mellanox FCA or MXM would be interesting to tests to see if things
improve. Although we have the (old) ConnectX (not the 2 or 3)
generation, I would need to validate if this is compatible.

Florent

On Mon, Aug 20, 2012 at 6:49 PM, Susan Coulter <markus at lanl.gov> wrote:
> Has anyone tested the collectives offload provided by MLNX OFED?
>
> I have everything set up correctly according to the FCA documentation and
> fca_managerd is running.
> My attempts to run with "coll_fca_enable 1" are failing.
> First with an inability to talk to umad port
>
> ibwarn: [2337] mad_rpc_open_port: can't open UMAD port (mlx4_0:1)
> 1345251776.473792 [FCA_DEV cja001 2337] dev.c:560 error Failed to initialize
> SA: Cannot assign requested address
>
> The permissions on /dev/infiniband/umad0 were opened up - which got rid of
> the error messages.
> Now it is dying and spitting out a bunch of empty core files - but nothing
> in the job run log.
> It looks like it is going to start, then poof !
> ====================================
>
> Susan Coulter
> HPC-3 Network/Infrastructure
> 505-667-8425
> Increase the Peace...
> An eye for an eye leaves the whole world blind
> ====================================
>
>
> _______________________________________________
> Users mailing list
> Users at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users
>



More information about the Users mailing list