[libfabric-users] Test results on some machines
Biddiscombe, John A.
biddisco at cscs.ch
Mon Feb 13 07:23:31 PST 2017
>
In order to launch the fabtests with the gni provider, you either need
to do it by hand or via CCM mode. Please see our wiki for directions:
<
Sorry, when I collected those outputs, I forgot about the fabtests instructions.
I have already tried the manual method outlined on the page and it does not give any better results. I’m using the same script to run the fi_pingpong (it works), as for each of the fabtest examples and none of them appear to run properly
Any other ideas?
JB
For example
./frun.sh ~/apps/fabtests/bin/fi_msg_pingpong
running /users/biddisco/apps/fabtests/bin/fi_msg_pingpong on nid000[91-92]
nid00091 is 148.187.32.92
Generated command is srun -n 2 --ntasks-per-node=1 -l --multi-prog ./scalable.conf
0 /users/biddisco/apps/fabtests/bin/fi_msg_pingpong -p gni
1 /users/biddisco/apps/fabtests/bin/fi_msg_pingpong -p gni nid00091
1: fi_connect(): common/shared.c:587, ret=-5 (Input/output error)
0: fi_getinfo(): common/shared.c:454, ret=-61 (No data available)
srun: error: nid00091: task 0: Exited with exit code 61
srun: Terminating job step 786161.55
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: error: nid00092: task 1: Exited with exit code 5
daint102:/scratch/snx3000/biddisco/build$ ./frun.sh ~/apps/fabtests/bin/fi_rdm_rma_simple
running /users/biddisco/apps/fabtests/bin/fi_rdm_rma_simple on nid000[91-92]
nid00091 is 148.187.32.92
Generated command is srun -n 2 --ntasks-per-node=1 -l --multi-prog ./scalable.conf
0 /users/biddisco/apps/fabtests/bin/fi_rdm_rma_simple -p gni
1 /users/biddisco/apps/fabtests/bin/fi_rdm_rma_simple -p gni nid00091
1: fi_getinfo(): common/shared.c:454, ret=-61 (No data available)
srun: error: nid00092: task 1: Exited with exit code 61
srun: Terminating job step 786161.57
0: slurmstepd: error: *** STEP 786161.57 ON nid00091 CANCELLED AT 2017-02-13T16:20:18 ***
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: error: nid00091: task 0: Killed
daint102:/scratch/snx3000/biddisco/build$ ./frun.sh ~/apps/fabtests/bin/fi_msg_bw
running /users/biddisco/apps/fabtests/bin/fi_msg_bw on nid000[91-92]
nid00091 is 148.187.32.92
Generated command is srun -n 2 --ntasks-per-node=1 -l --multi-prog ./scalable.conf
0 /users/biddisco/apps/fabtests/bin/fi_msg_bw -p gni
1 /users/biddisco/apps/fabtests/bin/fi_msg_bw -p gni nid00091
0: fi_getinfo(): common/shared.c:454, ret=-61 (No data available)
1: fi_connect(): common/shared.c:587, ret=-5 (Input/output error)
srun: error: nid00091: task 0: Exited with exit code 61
srun: Terminating job step 786161.59
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: error: nid00092: task 1: Exited with exit code 5
More information about the Libfabric-users
mailing list