[ofa-general] Re: new API for the rdma-cma

Steve Wise swise at opengridcomputing.com
Tue Jan 15 15:38:05 PST 2008


Sean Hefty wrote:
>> What do you think about adding an API to the rdma-cma allowing
>> applications to get a list of ip addresses associated with a particular
>> rdma device/port?
> 
> It seems kind of backwards from the design of the rdma_cm, but if it's useful to
> end users, I don't have any objections to having it.
> 

Consider OMPI, which looks at each device on the system that can be used 
to connect to the other nodes.  Each device is analyzed to see what 
method of communication should be used (tcp, ib, iwarp, whatever).  Then 
these interfaces and their attributes are conveyed to all the nodes and 
the desired communication mesh is determined.

Well to do this type of approach and support the rdma-cm, one really 
needs to know which ip addresses are bound to which rdma devices so 
those ipaddress (and the chosen ip port number) can be advertised.

BTW: mvapich2 punted on this issue by putting the local rdma ipaddr in a 
file called /etc/mv2.conf.  Thus they cannot easily handle multiple rdma 
devices since the file is global to each host.

OMPI is trying to be more dynamic and discover these addresses at run 
time on a per-device/port basis.


> Do you know if there would be a way to obtain this data entirely from userspace,
> or would kernel support be needed?
> 

It can be done in userspace, but it will need to issue an ioctl() to get 
the ipaddress.  The ioctl would be on a AF_INET/SOCK_DGRAM socket file 
descriptor allocated just for this query. (this is how ifconfig works by 
the way).

Here's a simple ipv4-only way to do this:

Prototype:

u32 rdma_get_ipv4addr(struct ibv_context *verbs, int dev_port_num)

Description:

Returns the main IPv4 address associated with the rdma device/port in 
Network Byte Order.  This is suitable for stuffing in a sockaddr_in and 
issuing an rdma_connect(), for instance.

Implementation:

This function can use the /sys/class/infiniband_verbs*/device/net:* 
directories to derive the netdev interface name associated with this 
device/port.  Like this:

[root at vic20 ~]# ls -d /sys/class/infiniband_verbs/uverbs*/device/net:*
/sys/class/infiniband_verbs/uverbs0/device/net:ib0
/sys/class/infiniband_verbs/uverbs0/device/net:ib1
/sys/class/infiniband_verbs/uverbs1/device/net:eth1

Then the code will do SIOCGIFADDR on the appropriate netdev name, like 
"ib0", "ib1", or "eth1".

SIOCGIFADDR returns a sockaddr with the IPv4 address bound to that 
interface.

This is simple.  It doesn't cover all the possibilities.  But it might 
be enough for what folks need...

I'm prototyping this now (as part of our OMPI/rdma-cm/iwarp work).

Steve.



More information about the general mailing list