[ofa-general] uDAPL question

Boris Shpolyansky boris at mellanox.com
Sat Mar 10 15:02:26 PST 2007


3.0


Boris Shpolyansky
Application Engineer
Mellanox Technologies Inc.
2900 Stender Way
Santa Clara, CA 95054
Tel.: (408) 916 0014
Fax: (408) 970 3403
Cell: (408) 834 9365
www.mellanox.com 

----- Original Message -----
From: Scott Weitzenkamp (sweitzen) <sweitzen at cisco.com>
To: Boris Shpolyansky; general at lists.openfabrics.org <general at lists.openfabrics.org>
Sent: Sat Mar 10 14:45:25 2007
Subject: RE: [ofa-general] uDAPL question

What version of Intel MPI are you using?


________________________________

	From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Boris Shpolyansky
	Sent: Friday, March 09, 2007 8:40 PM
	To: general at lists.openfabrics.org
	Subject: [ofa-general] uDAPL question
	
	
	Hi, 
	 
	I'm trying to get simple Intel MPI benchmark running over IB (uDAPL) using OFED-1.1 stack.
	I'm consistently getting the following error:
	 
	[root at ibd005 ~]# ./runjob_I_MPI.boris 2
	Task 0 of 2 tasks started on host ibd005.ibd.mti.com
	clock_resolution = 1.00e-06 s
	Task 1 of 2 tasks started on host ibd006.ibd.mti.com
	[0:ibd005] unexpected DAPL event 4006 from 1:ibd006
	[1:ibd006] unexpected DAPL event 4006 from 0:ibd005
	rank 0 in job 14  ibd005_36193   caused collective abort of all ranks
	  exit status of rank 0: return code 254 
	
	I did some digging and found out that event 4006 (actually 0x4006) means DAT_CONNECTION_EVENT_BROKEN
	and it is returned by function dat_rmr_bind. 
	So my question is why this function consistently fails.
	I'm using standard dat.conf file:
	 
	OpenIB-cma u1.2 nonthreadsafe default /usr/local/ofed/lib64/libdaplcma.so mv_dapl.1.2 "ib0 0" ""
	
	Appreciate your help,
	 
		Boris Shpolyansky 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070310/2158327a/attachment.html>


More information about the general mailing list