[openib-general] HPCC benchmark aborts at MPIRandomAccess test

David Costa David.Costa at Sun.COM
Fri Dec 1 15:22:51 PST 2006


My apologies to everyone who replied, I am indeed using OFED 1.1 and the 
included OSU MVAPICH. I will try your patch on Monday Boris and reply to 
the list about how I made out.

Best Regards,

Dave Costa

Boris Shpolyansky wrote:
> Hi David,
>  
> If you are using OFED-1.1 stack and OSU MVAPICH provided with the 
> OFED-1.1 package as your MPI layer,
> the attached patch should solve your problem.
>  
> Please, let me know if that helped.
>  
> Regards,
>  
> Boris Shpolyansky
> Application Engineer
> Mellanox Technologies Inc.
> 2900 Stender Way
> Santa Clara, CA 95054
> Tel.: (408) 916 0014
> Fax: (408) 970 3403
> Cell: (408) 834 9365
> www.mellanox.com
>
> ------------------------------------------------------------------------
> *From:* openib-general-bounces at openib.org 
> [mailto:openib-general-bounces at openib.org] *On Behalf Of *David Costa
> *Sent:* Friday, December 01, 2006 2:21 PM
> *To:* openib-general at openib.org; David.Costa at Sun.COM; Robert Houk; 
> Anthony Vinciguerra; Thomas Babbit
> *Subject:* [openib-general] HPCC benchmark aborts at MPIRandomAccess test
>
> Hello all,
>
> I am running the HPCC benchmark on a Sun Blade 8000 blade server. I 
> have two blades running RHEL4U3 and SLESSP3 respectively with 32 
> GBytes of memory each. The HPCC benchmark is running on a sun 
> developed IB module that uses the Mellanox 25204 chips. When it gets 
> to the MPIRandomAccess test, it immediately fails and I see the 
> following messages listed below.
>
> Does anyone know what the messages mean, and a possible  underlying 
> cause?  Please reply to me directly as I am not subscribed to this list.
>
> Thank you,
>
> Dave Costa
> david.costa at sun.com
>
>
> [root at an1-bl0 ~]# mpirun_rsh -rsh -np 32 -hostfile /root/hostfile 
> /usr/local/bin/hpcc
> 24 - MPI_CANCEL : Internal MPI error!
> [24] [] Aborting Program!
> mpirun_rsh: Abort signaled from [24]
> 26 - MPI_CANCEL : Internal MPI error!
> [26] [] Aborting Program!
> 15 - MPI_CANCEL : Internal MPI error!
> [15] [] Aborting Program!
> 18 - MPI_CANCEL : Internal MPI error!
> [18] [] Aborting Program!
> 22 - MPI_CANCEL : Internal MPI error!
> [22] [] Aborting Program!
> 4 - MPI_CANCEL : Internal MPI error!
> [4] [] Aborting Program!
> 13 - MPI_CANCEL : Internal MPI error!
> [13] [] Aborting Program!
> 11 - MPI_CANCEL : Internal MPI error!
> 16 - MPI_CANCEL : Internal MPI error!
> [16] [] Aborting Program!
> [11] [] Aborting Program!
> 28 - MPI_CANCEL : Internal MPI error!
> [28] [] Aborting Program!
> [19] Abort: [an1-bl1:19] Got completion with error, code=12
>  at line 2365 in file viacheck.c
> [23] Abort: [an1-bl1:23] Got completion with error, code=12
>  at line 2365 in file viacheck.c
> [17] Abort: [an1-bl1:17] Got completion with error, code=12
>  at line 2365 in file viacheck.c
> done. 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20061201/63493259/attachment.html>


More information about the general mailing list