[ofa-general] Re: new API for the rdma-cma

Or Gerlitz ogerlitz at voltaire.com
Wed Jan 16 06:11:43 PST 2008


Steve Wise wrote:
> Sean Hefty wrote:

>>> What do you think about adding an API to the rdma-cma allowing
>>> applications to get a list of ip addresses associated with a particular
>>> rdma device/port?

>> It seems kind of backwards from the design of the rdma_cm, but if it's 
>> useful to end users, I don't have any objections to having it.

> Consider OMPI, which looks at each device on the system that can be used 
> to connect to the other nodes.  Each device is analyzed to see what 
> method of communication should be used (tcp, ib, iwarp, whatever).  Then 
> these interfaces and their attributes are conveyed to all the nodes and 
> the desired communication mesh is determined.

Steve, Sean, this approach assumes the MPI job scheduler associate a 
rank with HW using a <node, device, port, pkey, sl, etc> scheme, where I 
would like to let a <node, ip address/s> scheme be supported, since I 
believe that what you suggest is covered by such design.

How about a different approach that complies better to the nature of the 
rdma-cm and seems to support the requirement: have an API that would let 
apps to get a list of  {interface name, ip addresses, device-attributes} 
containing all the "RDMA" interfaces, that is those whose ether-type is 
ARPHRD_INFINIBAND and what-ever key that identified iwarp interfaces.

This can easily be implemented at user space using netlink calls as done 
by the ip(8) command, for example for the following device

> $ ip addr show ib0
> 25: ib0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 128
>     link/[32] 80:00:04:04:fe:80:00:00:00:00:00:00:00:08:f1:04:03:97:08:dd
>     brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
>     inet 192.168.3.61/24 brd 192.168.3.255 scope global ib0
>     inet 192.168.3.71/32 scope global ib0
>     inet6 fe80::208:f104:397:8dd/64 scope link valid_lft forever preferred_lft forever

the user space script/code would note that its an IPoIB device, with the 
IPv4 addresses being 192.168.3.61/24 and 192.168.3.71/32

Now, if you want to go deeper and expose the <device, port, pkey, sl, 
etc> to the job scheduler or to the rank, you can implement this code 
that uses netlink in librdmacm (no need for kernel changes) and it would 
reuse the code present now in rdma_resolve_addr for the local 
resolution, that is resolve the <device,port,pkey> from an local ip 
address.

Or.




More information about the general mailing list