[Users] FCA testing

Susan Coulter markus at lanl.gov
Mon Aug 20 15:44:00 PDT 2012


Has anyone tested the collectives offload provided by MLNX OFED?

I have everything set up correctly according to the FCA documentation and fca_managerd is running.
My attempts to run with "coll_fca_enable 1" are failing.
First with an inability to talk to umad port

ibwarn: [2337] mad_rpc_open_port: can't open UMAD port (mlx4_0:1)
1345251776.473792 [FCA_DEV cja001 2337] dev.c:560 error Failed to initialize SA: Cannot assign requested address

The permissions on /dev/infiniband/umad0 were opened up - which got rid of the error messages.
Now it is dying and spitting out a bunch of empty core files - but nothing in the job run log.  
It looks like it is going to start, then poof !

====================================

Susan Coulter
HPC-3 Network/Infrastructure
505-667-8425
Increase the Peace...
An eye for an eye leaves the whole world blind
====================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20120820/79403b84/attachment.html>


More information about the Users mailing list