<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.2900.3268" name=GENERATOR></HEAD>
<BODY>
<DIV><SPAN class=495060320-04042008><FONT face=Arial size=2>Hey, all, I'm not
sure if this is a known bug or some sort of limitation I'm unaware of, but I've
been building and testing with the OFED 1.3 GA release on a small fabric that
has a mix of Arbel-based and newer Connect-X HCAs.</FONT></SPAN></DIV>
<DIV><SPAN class=495060320-04042008><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=495060320-04042008><FONT face=Arial size=2>What I've discovered
is that mvapich and openmpi work fine across the entire fabric, but mvapich2
crashes when I use a mix of Arbels and Connect-X. The errors vary depending on
the test program but here's an example:</FONT></SPAN></DIV>
<DIV><SPAN class=495060320-04042008><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=495060320-04042008><FONT face=Arial size=2>[mheinz@compute-0-0
IMB-3.0]$ mpirun -n 5 ./IMB-MPI1<BR>.</FONT></SPAN></DIV>
<DIV><SPAN class=495060320-04042008><FONT face=Arial
size=2>.</FONT></SPAN></DIV>
<DIV><SPAN class=495060320-04042008><FONT face=Arial
size=2>.</FONT></SPAN></DIV>
<DIV><SPAN class=495060320-04042008><FONT face=Arial size=2>(output
snipped)</FONT></SPAN></DIV>
<DIV><SPAN class=495060320-04042008><FONT face=Arial
size=2>.</FONT></SPAN></DIV>
<DIV><SPAN class=495060320-04042008><FONT face=Arial
size=2>.</FONT></SPAN></DIV>
<DIV><SPAN class=495060320-04042008><FONT face=Arial
size=2>.</DIV></FONT></SPAN>
<DIV><SPAN class=495060320-04042008><FONT face=Arial
size=2><BR>#-----------------------------------------------------------------------------<BR>#
Benchmarking Sendrecv<BR># #processes = 2<BR># ( 3 additional processes waiting
in
MPI_Barrier)<BR>#-----------------------------------------------------------------------------<BR>
#bytes #repetitions t_min[usec] t_max[usec]
t_avg[usec]
Mbytes/sec<BR>
0
1000
3.51
3.51
3.51
0.00<BR>
1
1000
3.63
3.63
3.63
0.52<BR>
2
1000
3.67
3.67
3.67
1.04<BR>
4
1000
3.64
3.64
3.64
2.09<BR>
8
1000
3.67
3.67
3.67
4.16<BR>
16
1000
3.67
3.67
3.67
8.31<BR>
32
1000
3.74
3.74
3.74
16.32<BR>
64
1000
3.90
3.90
3.90
31.28<BR>
128
1000
4.75
4.75
4.75
51.39<BR>
256
1000
5.21
5.21
5.21
93.79<BR>
512
1000
5.96
5.96
5.96
163.77<BR>
1024
1000
7.88
7.89
7.89
247.54<BR>
2048
1000
11.42
11.42
11.42
342.00<BR>
4096
1000
15.33
15.33
15.33
509.49<BR>
8192
1000
22.19
22.20
22.20
703.83<BR>
16384
1000
34.57
34.57
34.57
903.88<BR>
32768
1000
51.32
51.32
51.32
1217.94<BR>
65536
640
85.80
85.81
85.80
1456.74<BR>
131072
320
155.23
155.24 155.24
1610.40<BR>
262144
160
301.84
301.86 301.85
1656.39<BR>
524288
80
598.62
598.69 598.66
1670.31<BR>
1048576
40 1175.22
1175.30 1175.26
1701.69<BR>
2097152
20 2309.05
2309.05 2309.05
1732.32<BR>
4194304
10 4548.72
4548.98 4548.85
1758.64<BR>[0] Abort: Got FATAL event 3<BR> at line 796 in file
ibv_channel_manager.c<BR>rank 0 in job 1
compute-0-0.local_36049 caused collective abort of all
ranks<BR> exit status of rank 0: killed by signal
9<BR></FONT></SPAN></DIV>
<DIV><FONT face=Arial size=2><SPAN class=495060320-04042008>If, however, I
define my mpdring to contain only Connect-X systems OR only Arbel systems,
IMB-MPI1 runs to completion.</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=495060320-04042008></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=495060320-04042008>Can any suggest a
workaround or is this a real bug with mvapich2?</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=495060320-04042008></SPAN></FONT> </DIV>
<DIV align=left><FONT face=Arial size=2>--</FONT></DIV>
<DIV align=left><FONT face=Arial size=2>Michael Heinz</FONT></DIV>
<DIV align=left><FONT face=Arial size=2>Principal Engineer, Qlogic
Corporation</FONT></DIV>
<DIV align=left><FONT face=Arial size=2>King of Prussia,
Pennsylvania</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV></BODY></HTML>