[ofa-general] How to establish IB communcation more effectively?
Davis, Arlin R
arlin.r.davis at intel.com
Tue May 12 12:21:23 PDT 2009
>Hi all,
> I'm using libibverbs to build a cluster memory pool, and
>using TCP/IP
>handshake to exchange memory information and establish the connection
>before the IB communication. While I found this process costed a lot
>of time, 100ms in 1GEth LAN, so I want to use the rdma_cm or ib_ucm to
>handle the establishment. But I dont't find sample code or API
>document, is there anything I missed?
> BTW, how to establish communication in current OFED? Any
>comparision
>or suggestion is appreciated, that will help me a lot.
>
What scale are you targeting?
Your single connection number seems high. For a connection
(socket connect, exchanging QP info, private data, qp modify)
using uDAPL socket cm versus rdma_cm I get:
socket_cm on 1Ge == ~900us
socket_cm on IPoIB (mlx4 ddr) == ~400us
rdma_cm on IB (mlx4 ddr) == ~2200us
As you can see, the path record queries via rdma_cm add
a substantial penalty. With larger scale clusters this
really starts to hurt.
You can look at uDAPL (dapl/openib_cma and dapl/openib_scm)
source for examples of a socket cm implementation vs rdma_cm.
With the socket cm version we ran up to 14,400 cores with
no problems using Intel MPI. However, with rdma_cm we
had problems reaching 1000 cores due to IPoIB ARP storms and
SA path record query issues. If someone would step up and
provide a scalable SA caching solution in OFED then rdma_cm
could possibly work for us again. Any takers? :^)
-arlin
More information about the general
mailing list