[ofa-general] How to establish IB communcation more effectively?

Davis, Arlin R arlin.r.davis at intel.com
Tue May 12 12:21:23 PDT 2009


>Hi all,
>    I'm using libibverbs to build a cluster memory pool, and 
>using TCP/IP
>handshake to exchange memory information and establish the connection
>before the IB communication. While I found this process costed a lot
>of time, 100ms in 1GEth LAN, so I want to use the rdma_cm or ib_ucm to
>handle the establishment. But I dont't find sample code or API
>document, is there anything I missed?
>    BTW, how to establish communication in current OFED? Any 
>comparision
>or suggestion is appreciated, that will help me a lot.
>

What scale are you targeting?

Your single connection number seems high. For a connection
(socket connect, exchanging QP info, private data, qp modify)
using uDAPL socket cm versus rdma_cm I get:

socket_cm on 1Ge == ~900us
socket_cm on IPoIB (mlx4 ddr) == ~400us
rdma_cm on IB (mlx4 ddr) == ~2200us

As you can see, the path record queries via rdma_cm add 
a substantial penalty. With larger scale clusters this
really starts to hurt.

You can look at uDAPL (dapl/openib_cma and dapl/openib_scm) 
source for examples of a socket cm implementation vs rdma_cm. 
With the socket cm version we ran up to 14,400 cores with 
no problems using Intel MPI. However, with rdma_cm we 
had problems reaching 1000 cores due to IPoIB ARP storms and
SA path record query issues. If someone would step up and 
provide a scalable SA caching solution in OFED then rdma_cm 
could possibly work for us again. Any takers? :^)

-arlin




More information about the general mailing list