[ewg] rdma/cm: revert associating an RDMA device when binding to loopback

Jason Gunthorpe jgunthorpe at obsidianresearch.com
Tue Feb 9 14:17:04 PST 2010


On Tue, Feb 09, 2010 at 05:01:21PM -0500, Jeff Squyres wrote:

> 1. Is this now the recommended way to find all the IP interfaces that support RDMA:
> 
> - loop over all local IP addresses
> - if 127.0.0.1/8, skip
> - try to rdma_bind_addr()
> - if it succeeds and verbs ptr is != NULL, it's an RDMA device

RDMA is not special, it is just like any other IP service. RDMA is
supported on loopback.

To find the list of RMDA capable IPs you do the rdma_bind_addr test.

You then have to transform that list exactly as you would for TCP to
get a list of candidate addresses that could be used for remote
connection. This means removing loopback, doing someting about link
local addresses, and matching as necessary IPs to interfaces, to
networks, and to source IPs on the connecting side. It isn't trivial,
but it is exactly the same as for TCP. I suppose ideally OMPI would
use the same codes for both TCP and RDMACM - and it should have user
configurables!

It is worth reviewing what the OMPI TCP does and at least checking
that the RDMACM hits all the same points.

> 2. Before Sean backed out the localhost behavior, when you
> rdma_addr_bind(127.0.0.1), what did the id->verbs pointer
> correspond to?

One of the RDMA verbs devices in the system. The API does not define
which one the kernel will select.

I think the current patches simply picked the first one.

Jason



More information about the ewg mailing list