[openib-general] openMPI for 2.6.17.10 kernel

david elsen elsen_david at yahoo.com
Wed Dec 6 11:30:52 PST 2006


Steve,
Somehow I get the following error message:

[0] Abort: [] Got completion with error 5, vendor code=a, dest rank=1
  at line 479 in file ibv_channel_manager.c
 [1] Abort: ibv_post_recv err with 22 at line 1420 in file rdma_iba_priv.c
 rank 1 in job 1  ammasso1_50414   caused collective abort of all ranks
   exit status of rank 1: killed by signal 9 


For detail, please see the following:
[root at ammasso1 0.9.8-RELEASE]# vi /etc/hosts
[root at ammasso1 0.9.8-RELEASE]# cd bin
[root at ammasso1 bin]# ./mpdboot -n 2
debug: starting
mpdroot: perror msg: Connection refused
running mpdallexit on ammasso1
LAUNCHED mpd on ammasso1  via  
debug: launch cmd= /root/0.9.8-RELEASE/bin/mpd.py   --ncpus=1 -e -d
debug: mpd on ammasso1  on port 50414
RUNNING: mpd on ammasso1
debug: info for running mpd: {'ncpus': 1, 'list_port': 50414, 'entry_port': '', 'host': 'ammasso1', 'entry_host': '', 'ifhn': ''}
LAUNCHED mpd on ammasso2  via  ammasso1
debug: launch cmd= ssh -x -n ammasso2.
 '/root/0.9.8-RELEASE/bin/mpd.py  -h ammasso1 -p 50414  --ncpus=1 -e -d' 
root at ammasso2.'s password: 
debug: mpd on ammasso2  on port 59327
RUNNING: mpd on ammasso2
debug: info for running mpd: {'entry_port': 50414, 'ncpus': 1, 'list_port': 59327, 'pid': 2997, 'host': 'ammasso2., 'entry_host': 'ammasso1', 'ifhn': ''}


[root at ammasso1 bin]# ./mpiexec -n 2 /root/IMB_2.3/src/IMB-MPI1
secretword=
#---------------------------------------------------
#    Intel (R) MPI Benchmark Suite V2.3, MPI-1 part    
#---------------------------------------------------
# Date       : Wed Dec  6 13:25:59 2006
# Machine    : i686# System     : Linux
# Release    : 2.6.17.13
# Version    : #1 SMP Wed Nov 8 17:34:14 PST 2006

#
# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype                   :   MPI_BYTE 
# MPI_Datatype for reductions    :   MPI_FLOAT
# MPI_Op                         :   MPI_SUM  
#
#

# List of Benchmarks to run:

# PingPong
# PingPing
# Sendrecv
# Exchange
# Allreduce
# Reduce
# Reduce_scatter
# Allgather
# Allgatherv
# Alltoall
# Bcast
# Barrier
recv desc error, 128
[0] Abort: [] Got completion with error 5, vendor code=a, dest rank=1
 at line 479 in file ibv_channel_manager.c
[1] Abort: ibv_post_recv err with 22 at line 1420 in file rdma_iba_priv.c
rank 1 in job 1  ammasso1_50414   caused collective abort of all ranks
  exit status of rank 1: killed by signal 9 

David




Steve Wise <swise at opengridcomputing.com> wrote: On Wed, 2006-12-06 at 11:17 -0800, david elsen wrote:
> Steve,
>  
> Thanks a lot for the reply. 
>  
> I could run the cpi from the example directory. 
>  
> But I see some error message when trying to run the IMB-MPI1. I am
> using 219297_IMB_2.3. Which version are you using?

I'm running the same release.

Steve.



 	
---------------------------------
Everyone is raving about the all-new Yahoo! Mail beta.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20061206/4ab94e88/attachment.html>


More information about the general mailing list