[ewg] RDMACM Differences

Jagga Soorma jagga13 at gmail.com
Sat Feb 26 13:05:27 PST 2011


Hello,

I am running into the following issue while trying to run osu_latency:

--
-bash-3.2$ mpiexec --mca btl openib,self -mca btl_openib_warn_default_gid_
prefix 0 -np 2 --hostfile mpihosts
/home/jagga/osu-micro-benchmarks-3.3/openmpi/ofed-1.5.2/bin/osu_latency
# OSU MPI Latency Test v3.3
# Size            Latency (us)
[amber04][[10252,1],1][connect/btl_openib_connect_oob.c:325:qp_connect_all]
error modifing QP to RTR errno says Invalid argument
[amber04][[10252,1],1][connect/btl_openib_connect_oob.c:815:rml_recv_cb]
error in endpoint reply start connect
--------------------------------------------------------------------------
mpiexec has exited due to process rank 1 with PID 6781 on
node amber04 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpiexec (as reported here).
--------------------------------------------------------------------------
--

I can get around this by adding the "--mca btl_openib_cpc_include rdmacm"
option.  However, I have another host with a different HCA with all the same
drivers and software versions that I can run this same command successfully
with using the rdmacm option.  What could be causing one of my environments
to fail but the other to work fine (without the rdmacm option)?

--
[root at amber03 ~]# ofed_info | grep OFED
MLNX_OFED_LINUX-1.5.2-1.0.0 (OFED-1.5.2-20101020-1520):
MLNX_OFED_LINUX-1.5.2-1.0.0
(/mswg/release/ofed-1.5.2-rpms/rnfs-utils/rnfs-utils-1.1.5-10.OFED.src.rpm):

[root at amber03 ~]# ibv_devinfo
hca_id:    mlx4_0
    transport:            InfiniBand (0)
    fw_ver:                2.7.9294
    node_guid:            78e7:d103:0021:8884
    sys_image_guid:            78e7:d103:0021:8887
    vendor_id:            0x02c9
    vendor_part_id:            26438
    hw_ver:                0xB0
    board_id:            HP_0200000003
    phys_port_cnt:            2
        port:    1
            state:            PORT_ACTIVE (4)
            max_mtu:        2048 (4)
            active_mtu:        2048 (4)
            sm_lid:            1
            port_lid:        20
            port_lmc:        0x00
            link_layer:        IB

        port:    2
            state:            PORT_ACTIVE (4)
            max_mtu:        2048 (4)
            active_mtu:        1024 (3)
            sm_lid:            0
            port_lid:        0
            port_lmc:        0x00
            link_layer:        Ethernet
--

Any help would be greatly appreciated.

Thanks,
-J
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ewg/attachments/20110226/5647f508/attachment.html>


More information about the ewg mailing list