[openib-general] HPCC benchmark aborts at MPIRandomAccess test

Boris Shpolyansky boris at mellanox.com
Fri Dec 1 14:29:42 PST 2006


Hi David,
 
If you are using OFED-1.1 stack and OSU MVAPICH provided with the
OFED-1.1 package as your MPI layer,
the attached patch should solve your problem.
 
Please, let me know if that helped.
 
Regards,
 
Boris Shpolyansky
Application Engineer
Mellanox Technologies Inc.
2900 Stender Way
Santa Clara, CA 95054
Tel.: (408) 916 0014
Fax: (408) 970 3403
Cell: (408) 834 9365
www.mellanox.com

________________________________

From: openib-general-bounces at openib.org
[mailto:openib-general-bounces at openib.org] On Behalf Of David Costa
Sent: Friday, December 01, 2006 2:21 PM
To: openib-general at openib.org; David.Costa at Sun.COM; Robert Houk; Anthony
Vinciguerra; Thomas Babbit
Subject: [openib-general] HPCC benchmark aborts at MPIRandomAccess test


Hello all,

I am running the HPCC benchmark on a Sun Blade 8000 blade server. I have
two blades running RHEL4U3 and SLESSP3 respectively with 32 GBytes of
memory each. The HPCC benchmark is running on a sun developed IB module
that uses the Mellanox 25204 chips. When it gets to the MPIRandomAccess
test, it immediately fails and I see the following messages listed
below.

Does anyone know what the messages mean, and a possible  underlying
cause?  Please reply to me directly as I am not subscribed to this list.

Thank you,

Dave Costa
david.costa at sun.com


[root at an1-bl0 ~]# mpirun_rsh -rsh -np 32 -hostfile /root/hostfile
/usr/local/bin/hpcc
24 - MPI_CANCEL : Internal MPI error!
[24] [] Aborting Program!
mpirun_rsh: Abort signaled from [24]
26 - MPI_CANCEL : Internal MPI error!
[26] [] Aborting Program!
15 - MPI_CANCEL : Internal MPI error!
[15] [] Aborting Program!
18 - MPI_CANCEL : Internal MPI error!
[18] [] Aborting Program!
22 - MPI_CANCEL : Internal MPI error!
[22] [] Aborting Program!
4 - MPI_CANCEL : Internal MPI error!
[4] [] Aborting Program!
13 - MPI_CANCEL : Internal MPI error!
[13] [] Aborting Program!
11 - MPI_CANCEL : Internal MPI error!
16 - MPI_CANCEL : Internal MPI error!
[16] [] Aborting Program!
[11] [] Aborting Program!
28 - MPI_CANCEL : Internal MPI error!
[28] [] Aborting Program!
[19] Abort: [an1-bl1:19] Got completion with error, code=12
 at line 2365 in file viacheck.c
[23] Abort: [an1-bl1:23] Got completion with error, code=12
 at line 2365 in file viacheck.c
[17] Abort: [an1-bl1:17] Got completion with error, code=12
 at line 2365 in file viacheck.c
done. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20061201/28eca796/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smpi_cancel.patch
Type: application/octet-stream
Size: 1116 bytes
Desc: smpi_cancel.patch
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20061201/28eca796/attachment.obj>


More information about the general mailing list