<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.2900.3059" name=GENERATOR></HEAD>
<BODY>
<DIV><SPAN class=215521723-11032007><FONT face=Arial size=2>On the other hand
after reviewing source code it seems like
DAT_CONNECTION_EVENT_BROKEN</FONT></SPAN></DIV>
<DIV><SPAN class=215521723-11032007><FONT face=Arial size=2>is returned in case
of failure to establish connection - so it looks more like a CM
issue.</FONT></SPAN></DIV>
<DIV><SPAN class=215521723-11032007><FONT face=Arial size=2>Any suggestion on
how to debug this one ?</FONT></SPAN></DIV>
<DIV><SPAN class=215521723-11032007><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=215521723-11032007><FONT face=Arial
size=2>Thanks,</FONT></SPAN></DIV>
<DIV><FONT face=Arial><FONT size=2>Boris<SPAN
class=215521723-11032007>.</SPAN></FONT></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2></FONT> </DIV><FONT face=Arial
size=2></FONT><BR>
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> Boris Shpolyansky <BR><B>Sent:</B> Friday,
March 09, 2007 8:40 PM<BR><B>To:</B>
'general@lists.openfabrics.org'<BR><B>Subject:</B> uDAPL
question<BR></FONT><BR></DIV>
<DIV></DIV>
<DIV><SPAN class=965051803-10032007><FONT face=Arial><FONT size=2>Hi<SPAN
class=546583204-10032007><FONT
color=#0000ff>, </FONT></SPAN></FONT></FONT></SPAN></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=965051803-10032007></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=965051803-10032007>I'm trying to get
simple Intel MPI benchmark running over IB (uDAPL) using OFED-1.1
stack.</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN class=965051803-10032007>I'm consistently
getting the following error:</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=965051803-10032007></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=965051803-10032007>[root@ibd005 ~]#
./runjob_I_MPI.boris 2<BR>Task 0 of 2 tasks started on host
ibd005.ibd.mti.com<BR>clock_resolution = 1.00e-06 s<BR>Task 1 of 2 tasks started
on host ibd006.ibd.mti.com<BR>[0:ibd005] unexpected DAPL event 4006 from
1:ibd006<BR>[1:ibd006] unexpected DAPL event 4006 from 0:ibd005<BR>rank 0 in job
14 ibd005_36193 caused collective abort of all ranks<BR>
exit status of rank 0: return code 254 <BR></SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN class=965051803-10032007>I did some digging
and found out that event 4006 (actually 0x4006) means
DAT_CONNECTION_EVENT_BROKEN</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN class=965051803-10032007>and it is returned
by function dat_rmr_bind. </SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN class=965051803-10032007>So my question is
why this function consistently fails.</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN class=965051803-10032007>I'm using standard
dat.conf file:</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=965051803-10032007></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=965051803-10032007>OpenIB-cma u1.2
nonthreadsafe default /usr/local/ofed/lib64/libdaplcma.so mv_dapl.1.2 "ib0 0"
""<BR></SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN class=965051803-10032007>Appreciate your
help,</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN class=965051803-10032007><FONT
color=#0000ff></FONT> </DIV></SPAN></FONT>
<DIV dir=ltr align=left><FONT face=Arial><FONT size=2>Boris
Shpolyansky</FONT></FONT><FONT face=Arial><FONT size=2><SPAN
class=546583204-10032007> </SPAN></FONT></FONT></DIV></BODY></HTML>