[libfabric-users] Test results on some machines

Biddiscombe, John A. biddisco at cscs.ch
Mon Feb 13 07:23:31 PST 2017


>
    In order to launch the fabtests with the gni provider, you either need
    to do it by hand or via CCM mode.  Please see our wiki for directions:
<

Sorry, when I collected those outputs, I forgot about the fabtests instructions.
I have already tried the manual method outlined on the page and it does not give any better results. I’m using the same script to run the fi_pingpong (it works), as for each of the fabtest examples and none of them appear to run properly

Any other ideas?

JB

For example  

./frun.sh ~/apps/fabtests/bin/fi_msg_pingpong
running /users/biddisco/apps/fabtests/bin/fi_msg_pingpong   on nid000[91-92]
nid00091 is 148.187.32.92
Generated command is  srun -n 2 --ntasks-per-node=1 -l --multi-prog ./scalable.conf
0 /users/biddisco/apps/fabtests/bin/fi_msg_pingpong -p gni
1 /users/biddisco/apps/fabtests/bin/fi_msg_pingpong -p gni   nid00091

1: fi_connect(): common/shared.c:587, ret=-5 (Input/output error)
0: fi_getinfo(): common/shared.c:454, ret=-61 (No data available)
srun: error: nid00091: task 0: Exited with exit code 61
srun: Terminating job step 786161.55
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: error: nid00092: task 1: Exited with exit code 5


daint102:/scratch/snx3000/biddisco/build$ ./frun.sh ~/apps/fabtests/bin/fi_rdm_rma_simple
running /users/biddisco/apps/fabtests/bin/fi_rdm_rma_simple   on nid000[91-92]
nid00091 is 148.187.32.92
Generated command is  srun -n 2 --ntasks-per-node=1 -l --multi-prog ./scalable.conf
0 /users/biddisco/apps/fabtests/bin/fi_rdm_rma_simple -p gni
1 /users/biddisco/apps/fabtests/bin/fi_rdm_rma_simple -p gni   nid00091

1: fi_getinfo(): common/shared.c:454, ret=-61 (No data available)
srun: error: nid00092: task 1: Exited with exit code 61
srun: Terminating job step 786161.57
0: slurmstepd: error: *** STEP 786161.57 ON nid00091 CANCELLED AT 2017-02-13T16:20:18 ***
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: error: nid00091: task 0: Killed

daint102:/scratch/snx3000/biddisco/build$ ./frun.sh ~/apps/fabtests/bin/fi_msg_bw
running /users/biddisco/apps/fabtests/bin/fi_msg_bw   on nid000[91-92]
nid00091 is 148.187.32.92
Generated command is  srun -n 2 --ntasks-per-node=1 -l --multi-prog ./scalable.conf
0 /users/biddisco/apps/fabtests/bin/fi_msg_bw -p gni
1 /users/biddisco/apps/fabtests/bin/fi_msg_bw -p gni   nid00091

0: fi_getinfo(): common/shared.c:454, ret=-61 (No data available)
1: fi_connect(): common/shared.c:587, ret=-5 (Input/output error)
srun: error: nid00091: task 0: Exited with exit code 61
srun: Terminating job step 786161.59
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: error: nid00092: task 1: Exited with exit code 5




More information about the Libfabric-users mailing list