[OMPI users] [ofa-general] Re: openMPI over uDAPL doesn't work

Boris Bierbaum boris at lfbs.RWTH-Aachen.DE
Wed May 9 06:50:30 PDT 2007


I've run the whole IMB Benchmark Suite on 2, 3, and 4 nodes with 2
processes per node and --mca btl udapl,self. I didn't encouter any problems.

The comment above line 197 says that dat_ep_query() returns wrong port
numbers (which it does indeed), but I can't find any call to
dat_ep_query() in the uDAPL BTL code. Maybe the comment is out of date?

Boris


Andrew Friedley wrote:
> You say that fixes the problem, does it work even when running more than 
> one MPI process per node? (that is the case the hack fixes)  Simply 
> doing an mpirun with a -np paremeter higher than the number of nodes you 
> have set up should trigger this case, and making sure to use '-mca btl 
> udapl,self' (ie not SM or anything else).
> 
> Andrew
> 
> Boris Bierbaum wrote:
>> It has been explained in a different thread on [ofa-general] that the
>> problem lies in a combination of the OpenIB-cma provider not setting the
>> local and remote port numbers on endpoints correctly and Open MPI
>> stepping over the IA to save the port number to circumvent this problem,
>> thereby confusing the provider.
>>
>> I commented out line 197 in ompi/mca/btl/udapl/btl_udapl.c (Open MPI
>> 1.2.1 release) and this fixes the problem. As the problem in the
>> provider is currently being fixed, the whole saving of the port number
>> in the uDAPL BTL code will be unnecessary in the future.
>>
>> Steve Wise wrote:
>>>>> Can the UDAPL OFED wizards shed any light on the error messages that  
>>>>> are listed below?  In particular, these seem to be worrysome:
>>>>>
>>>>>>  setup_listener Permission denied
>>>>>  setup_listener Address already in use
>>>> These failures are from rdma_cm_bind indicating the port is already 
>>>> bound to this IA address. How are you creating the service point?
>>>> dat_psp_create or dat_psp_create_any? If it is psp_create_any then you 
>>>> will see some failures until it  gets to a free port. That is normal. 
>>>> Just make sure your create call returns DAT_SUCCESS.
>>>>
>>> Arlin, why doesn't dapl_psp_create_any() just pass a port of zero down
>>> and let the rdma-cma pick an available port number?
>>>
>>>
>>>
>>> _______________________________________________
>>> general mailing list
>>> general at lists.openfabrics.org
>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>>
>>> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>>>
>>
> _______________________________________________
> users mailing list
> users at open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 


-- 
|  _  RWTH | Boris Bierbaum
|_|_`_     | Lehrstuhl fuer Betriebssysteme
   | |_) _  | RWTH Aachen D-52056 Aachen
     |_)(_` | Tel: +49-241-80-27805
        ._) | Fax: +49-241-80-22339




More information about the general mailing list