[ofa-general] uDAPL question
Woodruff, Robert J
robert.j.woodruff at intel.com
Tue Apr 3 14:59:19 PDT 2007
This should now be fixed in OFED 1.2.
woody
-----Original Message-----
From: Yong Qin [mailto:yong.qin at qlogic.com]
Sent: Tuesday, April 03, 2007 12:43 PM
To: Boris Shpolyansky; Woodruff, Robert J; Hefty, Sean
Cc: general at lists.openfabrics.org
Subject: RE: [ofa-general] uDAPL question
Is there any progress on this issue? We are seeing exactly the same
error on OFED 1.1 + Intel MPI 3.0 -- "unexpected DAPL event 4006" and
wondering if there is a fix.
Thanks,
Yong
-----Original Message-----
From: general-bounces at lists.openfabrics.org
[mailto:general-bounces at lists.openfabrics.org] On Behalf Of Boris
Shpolyansky
Sent: Monday, March 12, 2007 11:28 AM
To: Woodruff, Robert J; general at lists.openfabrics.org; Hefty, Sean
Subject: RE: [ofa-general] uDAPL question
Hi Woody,
Thanks for your help.
I guess the problem is in the CM - is it ?
Can you point me to relevant communication/bug reports that explain the
fix for this issue ?
Would Sean be the right person to ask regarding what exact patch should
be added/removed ?
I would prefer to stick to OFED-1.1 code with minimal changes - if
possible -
to avoid compatibility issues.
Thanks,
Boris
-----Original Message-----
From: Woodruff, Robert J [mailto:robert.j.woodruff at intel.com]
Sent: Monday, March 12, 2007 8:24 AM
To: Boris Shpolyansky; general at lists.openfabrics.org; Hefty, Sean
Subject: RE: [ofa-general] uDAPL question
This is a known problem and should be fixed by now, There was a bad
patch that somehow got into OFED that was not in Sean main tree.
Assuming this bad patch has been removed, the problem should be fixed.
woody
________________________________
From: general-bounces at lists.openfabrics.org
[mailto:general-bounces at lists.openfabrics.org] On Behalf Of Boris
Shpolyansky
Sent: Friday, March 09, 2007 8:40 PM
To: general at lists.openfabrics.org
Subject: [ofa-general] uDAPL question
Hi,
I'm trying to get simple Intel MPI benchmark running over IB (uDAPL)
using OFED-1.1 stack.
I'm consistently getting the following error:
[root at ibd005 ~]# ./runjob_I_MPI.boris 2
Task 0 of 2 tasks started on host ibd005.ibd.mti.com clock_resolution =
1.00e-06 s Task 1 of 2 tasks started on host ibd006.ibd.mti.com
[0:ibd005] unexpected DAPL event 4006 from 1:ibd006 [1:ibd006]
unexpected DAPL event 4006 from 0:ibd005
rank 0 in job 14 ibd005_36193 caused collective abort of all ranks
exit status of rank 0: return code 254
I did some digging and found out that event 4006 (actually 0x4006) means
DAT_CONNECTION_EVENT_BROKEN and it is returned by function dat_rmr_bind.
So my question is why this function consistently fails.
I'm using standard dat.conf file:
OpenIB-cma u1.2 nonthreadsafe default
/usr/local/ofed/lib64/libdaplcma.so mv_dapl.1.2 "ib0 0" ""
Appreciate your help,
Boris Shpolyansky
_______________________________________________
general mailing list
general at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general
More information about the general
mailing list