[OMPI devel] OMPI over OFA udapl (was Re: [ofa-general] OpenMPI and RDMA-CM)
Donald Kerr
Don.Kerr at Sun.COM
Tue May 8 13:21:18 PDT 2007
Steve Wise wrote:
>On Tue, 2007-05-08 at 13:57 -0400, Andrew Friedley wrote:
>
>
>>Steve Wise wrote:
>>
>>
>>>>Well I've tried OMPI on ofed-1.2 udapl today and it doesn't work. I'm
>>>>debugging now.
>>>>
>>>>
>>>>
>>>Here's part of the problem (from ompi/btl/udapl/btl_udapl.c):
>>>
>>> /* TODO - big bad evil hack! */
>>> /* uDAPL doesn't ever seem to keep track of ports with addresses. This
>>> becomes a problem when we use dat_ep_query() to obtain a remote address
>>> on an endpoint. In this case, both the DAT_PORT_QUAL and the sin_port
>>> field in the DAT_SOCK_ADDR are 0, regardless of the actual port. This is
>>> a problem when we have more than one uDAPL process per IA - these
>>> processes will have exactly the same address, as the port is all
>>> we have to differentiate who is who. Thus, our uDAPL EP -> BTL EP
>>> matching algorithm will break down.
>>>
>>> So, we insert the port we used for our PSP into the DAT_SOCK_ADDR for
>>> this IA. uDAPL then conveniently propagates this to where we need it.
>>> */
>>> ((struct sockaddr_in*)attr.ia_address_ptr)->sin_port = htons(port);
>>> ((struct sockaddr_in*)&btl->udapl_addr.addr)->sin_port = htons(port);
>>>
>>>The OMPI code stuffs the port chosen by udapl for a listening endpoint
>>>into the ia address memory (which is owned by the udapl layer btw).
>>>There's a slight problem with that: The OFA udapl openib_cma code binds
>>>cm_id's to this ia_address regularly. When an hca is opened, a cm_id is
>>>bound to this address to obtain the local hca port number and gid that
>>>is being used. In addition, a cm_id is bound to this address each time
>>>an endpoint is created (either at ep_create time or ep_connect time).
>>>So that ia_address field is used by the dapl cm to create local
>>>cm_ids... Since the port was always zero, the rmda-cma would choose a
>>>unique port for each cm_id bound to that address.
>>>
>>>But OMPI sets a the port field to non-zero, the rdma_cma fails all the
>>>subsequent rdma_bind_addr() calls since the port is already in use.
>>>
>>>Perhaps this hack really is a workaround for a DAPL bug where somebodies
>>>dapl wasn't tracking port numbers correctly?
>>>
>>>
>>Yep. My memory is dim, but I think that was OFED's DAPL, or it was in
>>the generic part of DAPL that all implementations seem to share.
>>
>>As hinted by the comment (I wrote it by the way), I think the best
>>solution would be if dat_ep_query() returned the port number correctly.
>> Most of uDAPL seems to just pass around pointers to internal data
>>structures (which I'm not sure is the best idea in the world), so it
>>didn't seem like a trivial fix to me at the time. I remember
>>considering reporting this as a bug, but I didn't because the uDAPL
>>standard didn't seem to enforce any requirements on passing the port
>>number around with the address, so it technically wasn't wrong.
>>
>>Was the OFED uDAPL code switched from something else to RDMA CM at some
>>point? I'm almost certain I was running fine on OFED's uDAPL at one
>>point (in fact, a lot of the uDAPL BTL development I did was using the
>>OFED stack).
>>
>>
>
>Yes, the OFA uDAPL was changed from using the ib-cm to the rdma-cm a
>while back. Perhaps you ran on the ib-cm version? And, the rdma-cma
>started using port numbers and enforcing uniqueness even more recently I
>think.
>
>Perhaps Don Kerr has some insight on how the Sun uDAPL behaves? Should
>OMPI still need this hack?
>
>
From what I recall, and Andrew can probably set me straight if I get
this wrong. This hack was included because we were not able to pull the
remote port from dat_ep_query. If dat_ep_query supplies that data then
we could probably do away with the hack.
I have not heard back from the developer at Sun who implemented uDAPL
for Solaris. My thought is that it was also based on the older ib-cm but
will confirm. I submitted a bug against Solaris uDAPL to provide the
port via dat_ep_query awhile back and it looks like it has been fixed, I
just have not tested this because we weren't using it.
-DON
>
>Steve.
>
>
>
More information about the general
mailing list