[OMPI devel] OMPI over OFA udapl (was Re: [ofa-general] OpenMPI and RDMA-CM)
Andrew Friedley
afriedle at open-mpi.org
Tue May 8 10:57:44 PDT 2007
Steve Wise wrote:
>> Well I've tried OMPI on ofed-1.2 udapl today and it doesn't work. I'm
>> debugging now.
>>
>
> Here's part of the problem (from ompi/btl/udapl/btl_udapl.c):
>
> /* TODO - big bad evil hack! */
> /* uDAPL doesn't ever seem to keep track of ports with addresses. This
> becomes a problem when we use dat_ep_query() to obtain a remote address
> on an endpoint. In this case, both the DAT_PORT_QUAL and the sin_port
> field in the DAT_SOCK_ADDR are 0, regardless of the actual port. This is
> a problem when we have more than one uDAPL process per IA - these
> processes will have exactly the same address, as the port is all
> we have to differentiate who is who. Thus, our uDAPL EP -> BTL EP
> matching algorithm will break down.
>
> So, we insert the port we used for our PSP into the DAT_SOCK_ADDR for
> this IA. uDAPL then conveniently propagates this to where we need it.
> */
> ((struct sockaddr_in*)attr.ia_address_ptr)->sin_port = htons(port);
> ((struct sockaddr_in*)&btl->udapl_addr.addr)->sin_port = htons(port);
>
> The OMPI code stuffs the port chosen by udapl for a listening endpoint
> into the ia address memory (which is owned by the udapl layer btw).
> There's a slight problem with that: The OFA udapl openib_cma code binds
> cm_id's to this ia_address regularly. When an hca is opened, a cm_id is
> bound to this address to obtain the local hca port number and gid that
> is being used. In addition, a cm_id is bound to this address each time
> an endpoint is created (either at ep_create time or ep_connect time).
> So that ia_address field is used by the dapl cm to create local
> cm_ids... Since the port was always zero, the rmda-cma would choose a
> unique port for each cm_id bound to that address.
>
> But OMPI sets a the port field to non-zero, the rdma_cma fails all the
> subsequent rdma_bind_addr() calls since the port is already in use.
>
> Perhaps this hack really is a workaround for a DAPL bug where somebodies
> dapl wasn't tracking port numbers correctly?
Yep. My memory is dim, but I think that was OFED's DAPL, or it was in
the generic part of DAPL that all implementations seem to share.
As hinted by the comment (I wrote it by the way), I think the best
solution would be if dat_ep_query() returned the port number correctly.
Most of uDAPL seems to just pass around pointers to internal data
structures (which I'm not sure is the best idea in the world), so it
didn't seem like a trivial fix to me at the time. I remember
considering reporting this as a bug, but I didn't because the uDAPL
standard didn't seem to enforce any requirements on passing the port
number around with the address, so it technically wasn't wrong.
Was the OFED uDAPL code switched from something else to RDMA CM at some
point? I'm almost certain I was running fine on OFED's uDAPL at one
point (in fact, a lot of the uDAPL BTL development I did was using the
OFED stack).
> I'm going to run a few experiments:
>
> 1) remove the OMPI hack and see if things work fine for OFA udapl.
> Perhaps OFA udapl correctly tracks ports on endpoints?
Doubt it, but worth trying.
> 2) leave OMPI as-is and change OFA udapl to not assume the ia_addr
> sockaddr has a 0 port in it.
Pretty sure this will work, don't know if it's the correct solution though.
Andrew
More information about the general
mailing list