[ofa-general] Re: new API for the rdma-cma
Steve Wise
swise at opengridcomputing.com
Tue Jan 15 15:38:05 PST 2008
Sean Hefty wrote:
>> What do you think about adding an API to the rdma-cma allowing
>> applications to get a list of ip addresses associated with a particular
>> rdma device/port?
>
> It seems kind of backwards from the design of the rdma_cm, but if it's useful to
> end users, I don't have any objections to having it.
>
Consider OMPI, which looks at each device on the system that can be used
to connect to the other nodes. Each device is analyzed to see what
method of communication should be used (tcp, ib, iwarp, whatever). Then
these interfaces and their attributes are conveyed to all the nodes and
the desired communication mesh is determined.
Well to do this type of approach and support the rdma-cm, one really
needs to know which ip addresses are bound to which rdma devices so
those ipaddress (and the chosen ip port number) can be advertised.
BTW: mvapich2 punted on this issue by putting the local rdma ipaddr in a
file called /etc/mv2.conf. Thus they cannot easily handle multiple rdma
devices since the file is global to each host.
OMPI is trying to be more dynamic and discover these addresses at run
time on a per-device/port basis.
> Do you know if there would be a way to obtain this data entirely from userspace,
> or would kernel support be needed?
>
It can be done in userspace, but it will need to issue an ioctl() to get
the ipaddress. The ioctl would be on a AF_INET/SOCK_DGRAM socket file
descriptor allocated just for this query. (this is how ifconfig works by
the way).
Here's a simple ipv4-only way to do this:
Prototype:
u32 rdma_get_ipv4addr(struct ibv_context *verbs, int dev_port_num)
Description:
Returns the main IPv4 address associated with the rdma device/port in
Network Byte Order. This is suitable for stuffing in a sockaddr_in and
issuing an rdma_connect(), for instance.
Implementation:
This function can use the /sys/class/infiniband_verbs*/device/net:*
directories to derive the netdev interface name associated with this
device/port. Like this:
[root at vic20 ~]# ls -d /sys/class/infiniband_verbs/uverbs*/device/net:*
/sys/class/infiniband_verbs/uverbs0/device/net:ib0
/sys/class/infiniband_verbs/uverbs0/device/net:ib1
/sys/class/infiniband_verbs/uverbs1/device/net:eth1
Then the code will do SIOCGIFADDR on the appropriate netdev name, like
"ib0", "ib1", or "eth1".
SIOCGIFADDR returns a sockaddr with the IPv4 address bound to that
interface.
This is simple. It doesn't cover all the possibilities. But it might
be enough for what folks need...
I'm prototyping this now (as part of our OMPI/rdma-cm/iwarp work).
Steve.
More information about the general
mailing list