[OMPI users] [ofa-general] Re: openMPI over uDAPL doesn't work
Boris Bierbaum
boris at lfbs.RWTH-Aachen.DE
Wed May 9 06:50:30 PDT 2007
I've run the whole IMB Benchmark Suite on 2, 3, and 4 nodes with 2
processes per node and --mca btl udapl,self. I didn't encouter any problems.
The comment above line 197 says that dat_ep_query() returns wrong port
numbers (which it does indeed), but I can't find any call to
dat_ep_query() in the uDAPL BTL code. Maybe the comment is out of date?
Boris
Andrew Friedley wrote:
> You say that fixes the problem, does it work even when running more than
> one MPI process per node? (that is the case the hack fixes) Simply
> doing an mpirun with a -np paremeter higher than the number of nodes you
> have set up should trigger this case, and making sure to use '-mca btl
> udapl,self' (ie not SM or anything else).
>
> Andrew
>
> Boris Bierbaum wrote:
>> It has been explained in a different thread on [ofa-general] that the
>> problem lies in a combination of the OpenIB-cma provider not setting the
>> local and remote port numbers on endpoints correctly and Open MPI
>> stepping over the IA to save the port number to circumvent this problem,
>> thereby confusing the provider.
>>
>> I commented out line 197 in ompi/mca/btl/udapl/btl_udapl.c (Open MPI
>> 1.2.1 release) and this fixes the problem. As the problem in the
>> provider is currently being fixed, the whole saving of the port number
>> in the uDAPL BTL code will be unnecessary in the future.
>>
>> Steve Wise wrote:
>>>>> Can the UDAPL OFED wizards shed any light on the error messages that
>>>>> are listed below? In particular, these seem to be worrysome:
>>>>>
>>>>>> setup_listener Permission denied
>>>>> setup_listener Address already in use
>>>> These failures are from rdma_cm_bind indicating the port is already
>>>> bound to this IA address. How are you creating the service point?
>>>> dat_psp_create or dat_psp_create_any? If it is psp_create_any then you
>>>> will see some failures until it gets to a free port. That is normal.
>>>> Just make sure your create call returns DAT_SUCCESS.
>>>>
>>> Arlin, why doesn't dapl_psp_create_any() just pass a port of zero down
>>> and let the rdma-cma pick an available port number?
>>>
>>>
>>>
>>> _______________________________________________
>>> general mailing list
>>> general at lists.openfabrics.org
>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>>
>>> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>>>
>>
> _______________________________________________
> users mailing list
> users at open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
--
| _ RWTH | Boris Bierbaum
|_|_`_ | Lehrstuhl fuer Betriebssysteme
| |_) _ | RWTH Aachen D-52056 Aachen
|_)(_` | Tel: +49-241-80-27805
._) | Fax: +49-241-80-22339
More information about the general
mailing list