[openib-general] [Bug 181] New: HPL test always failed

bugzilla-daemon at openib.org bugzilla-daemon at openib.org
Wed Jul 26 07:42:23 PDT 2006


http://openib.org/bugzilla/show_bug.cgi?id=181

           Summary: HPL test always failed
           Product: OpenFabrics Windows
           Version: unspecified
          Platform: X86-64
        OS/Version: Other
            Status: NEW
          Severity: blocker
          Priority: P2
         Component: WSD
        AssignedTo: bugzilla at openib.org
        ReportedBy: evgeniyge at voltaire.com
                CC: evgeniyge at voltaire.com


We tried to run without RDMA read,and with low level driver HCA MT25208.
command line: mpiexec -hosts 4 hostname1 2 hostname2 2..... hpl.exe 
example of error msg:
job aborted:
rank: node: exit code: message
0: parker6: terminated
1: parker6: terminated
2: parker7: terminated
3: parker7: terminated
4: parker8: fatal error: Fatal error in MPI_Send: Internal MPI error!, error
stack:
MPI_Send(172)...................: MPI_Send(buf=0x0000000002B1A9B8, count=17820,
MPI_DOUBLE, dest=3, tag=1001, comm=0x84000002) failed
MPIDI_CH3I_Progress(165)........: handle_sock_op failed
handle_new_message_read(422)....:
MPIDI_CH3U_Handle_recv_pkt(1359): received unknown packet type
(type=1071575908)
5: parker8: terminated
6: parker9: terminated
7: parker9: terminated
---- error analysis -----
4: mpi has detected a fatal error and aborted hpl.exe run on parker8

---- error analysis -----
example of HPL.dat
--------------------------------------HPL.dat-------------------------------
HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out output file name (if any)
6 device out (6=stdout,7=stderr,file)
4 # of problems sizes (N)
5100 3000 3400 3500 Ns
4 # of NBs
100 97 95 90 NBs
0 PMAP process mapping (0=Row-,1=Column-major)
3 # of process grids (P x Q)
2 4 4 Ps
4 2 2 Qs
16.0 threshold
3 # of panel fact
0 1 2 PFACTs (0=left, 1=Crout, 2=Right)
2 # of recursive stopping criterium
2 4 NBMINs (>= 1)
1 # of panels in recursion
2 NDIVs
3 # of recursive panel fact.
0 1 2 RFACTs (0=left, 1=Crout, 2=Right)
1 # of broadcast
0 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1 # of lookahead depth
0 DEPTHs (>=0)
2 SWAP (0=bin-exch,1=long,2=mix)
64 swapping threshold
0 L1 in (0=transposed,1=no-transposed) form
0 U in (0=transposed,1=no-transposed) form
1 Equilibration (0=no,1=yes)
8 memory alignment in double (> 0)
-------------------------------------------------------------------------------




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the general mailing list