[libfabric-users] Test results on some machines

Biddiscombe, John A. biddisco at cscs.ch
Mon Feb 13 02:42:58 PST 2017


I’m slightly troubled by the results I’ve got on 3 machines, the verbs provider seems to work, though the number of not-run tests is disturbing. The gni provider seems terrible, but when I run fi_pingpong from the libfabric build (not one the fabtests), I can get it working on gni, but not with verbs.



Oddly the mem reg test on gni fails flat out (when run by hand on a compute node)



/users/biddisco/apps/fabtests/bin/fi_mr_test

Testing MR on fabric gni

Running mr_reg [Test fi_mr_reg for various buffer sizes]...FAIL: fi_mr_reg failed: ret=12 (Cannot allocate memory)

Running mr_regv [Test fi_mr_regv]...FAIL: fi_mr_regv failed: ret=12 (Cannot allocate memory)

Running mr_regattr [Test fi_mr_regattr]...FAIL: fi_mr_regattr failed: ret=22 (Invalid argument)

Summary: 3 tests failed



I’ve no idea how the fi_pingpong test manages to work (see end of email for putput) in light of that fail (I presume it uses the mem reg)



I was hoping that I’d find at least one test that worked on all 3 machines so that I’d have confidence that I could use it as a template to work from. It seems not to be so easy.



Can anyone shed light on these results and possibly give advice on how to improve the gni behaviour?

NB. if I enable extra output using –vvv, the basic problem with gni seems to be every fails happens due to

    fi_getinfo(): common/shared.c:454, ret=-61 (No data available)

or

    fi_mr_reg(): util/pingpong.c:1317, ret=-12 (Cannot allocate memory)



NB2. I tried the gni tests with nid00411 type names instead of ip addresses just in case, but it did not help.



Thanks



JB



In each case I’ve allocated a couple of nodes and found the correct ip addresses for the fabric)



# --------------------------------------------------------------

Generic cluster with infiniband support

# --------------------------------------------------------------



$HOME/apps/fabtests/bin/runfabtests.sh -p $HOME/apps/fabtests/bin verbs 192.168.3.36 192.168.3.38



# Test                                                  Result

# --------------------------------------------------------------

fi_getinfo_test -p verbs:                               Pass

fi_av_test -g 192.168.10.1 -n 1 -s 192.168.3.36 -p verbs:      Pass

fi_dom_test -n 2 -p verbs:                              Pass

fi_eq_test -p verbs:                                    Pass

fi_cq_test -p verbs:                                    Pass

fi_mr_test -p verbs:                                    Pass

fi_size_left_test -p verbs:                             Pass

fi_dgram g00n13s -p verbs:                              Pass

fi_rdm g00n13s -p verbs:                                Pass

fi_msg g00n13s -p verbs:                                Pass

fi_cm_data -p verbs:                                    Pass

fi_cq_data -p verbs:                                    Pass

fi_dgram -p verbs:                                    Notrun

fi_dgram_waitset -p verbs:                            Notrun

fi_msg -p verbs:                                        Pass

fi_msg_epoll -p verbs:                                  Pass

fi_msg_sockets -p verbs:                                Pass

fi_poll -t queue -p verbs:                            Notrun

fi_poll -t counter -p verbs:                          Notrun

fi_rdm -p verbs:                                        Pass

fi_rdm_rma_simple -p verbs:                           Notrun

fi_rdm_rma_trigger -p verbs:                          Notrun

fi_shared_ctx -p verbs:                               Notrun

fi_shared_ctx --no-tx-shared-ctx -p verbs:            Notrun

fi_shared_ctx --no-rx-shared-ctx -p verbs:            Notrun

fi_shared_ctx -e msg -p verbs:                        Notrun

fi_shared_ctx -e msg --no-tx-shared-ctx -p verbs:       Pass

fi_shared_ctx -e msg --no-rx-shared-ctx -p verbs:     Notrun

fi_shared_ctx -e dgram -p verbs:                      Notrun

fi_shared_ctx -e dgram --no-tx-shared-ctx -p verbs:    Notrun

fi_shared_ctx -e dgram --no-rx-shared-ctx -p verbs:    Notrun

fi_rdm_tagged_peek -p verbs:                            Pass

fi_scalable_ep -p verbs:                              Notrun

fi_cmatose -p verbs:                                    Pass

fi_rdm_shared_av -p verbs:                            Notrun

fi_msg_pingpong -I 5 -p verbs:                          Pass

fi_msg_bw -I 5 -p verbs:                              Notrun

fi_rma_bw -e msg -o write -I 5 -p verbs:              Notrun

fi_rma_bw -e msg -o read -I 5 -p verbs:               Notrun

fi_rma_bw -e msg -o writedata -I 5 -p verbs:          Notrun

fi_rma_bw -e rdm -o write -I 5 -p verbs:              Notrun

fi_rma_bw -e rdm -o read -I 5 -p verbs:               Notrun

fi_rma_bw -e rdm -o writedata -I 5 -p verbs:          Notrun

fi_msg_rma -o write -I 5 -p verbs:                      Pass

fi_msg_rma -o read -I 5 -p verbs:                       Pass

fi_msg_rma -o writedata -I 5 -p verbs:                  Pass

fi_msg_stream -I 5 -p verbs:                            Pass

fi_rdm_atomic -I 5 -o all -p verbs:                   Notrun

fi_rdm_cntr_pingpong -I 5 -p verbs:                   Notrun

fi_rdm_multi_recv -I 5 -p verbs:                        Pass

fi_rdm_pingpong -I 5 -p verbs:                          Pass

fi_rdm_rma -o write -I 5 -p verbs:                    Notrun

fi_rdm_rma -o read -I 5 -p verbs:                     Notrun

fi_rdm_rma -o writedata -I 5 -p verbs:                Notrun

fi_rdm_tagged_pingpong -I 5 -p verbs:                   Pass

fi_rdm_tagged_bw -I 5 -p verbs:                         Pass

fi_dgram_pingpong -I 5 -p verbs:                      Notrun

fi_rc_pingpong -n 5 -p verbs:                           Pass

fi_rc_pingpong -n 5 -e -p verbs:                        Pass

# --------------------------------------------------------------

# Total Pass                                                30

# Total Notrun                                              29

# Total Fail                                                 0

# Percentage of Pass                                       100

# --------------------------------------------------------------



# --------------------------------------------------------------

cray xc40 with gni

# --------------------------------------------------------------

$HOME/apps/fabtests/bin/runfabtests.sh -p $HOME/apps/fabtests/bin gni 148.187.33.168 148.187.33.172



# Test                                                  Result

# --------------------------------------------------------------

fi_getinfo_test -p gni:                                 Pass

fi_av_test -g 192.168.10.1 -n 1 -s 148.187.33.168 -p gni:      Pass

fi_dom_test -n 2 -p gni:                                Pass

fi_eq_test -p gni:                                      Pass

fi_cq_test -p gni:                                      Pass

fi_mr_test -p gni:                                      Fail

fi_size_left_test -p gni:                               Fail

fi_dgram g00n13s -p gni:                                Pass

fi_rdm g00n13s -p gni:                                  Pass

fi_msg g00n13s -p gni:                                  Pass

fi_cm_data -p gni:                                      Fail

fi_cq_data -p gni:                                      Fail

fi_dgram -p gni:                                        Fail

fi_dgram_waitset -p gni:                                Fail

fi_msg -p gni:                                          Fail

fi_msg_epoll -p gni:                                    Fail

fi_msg_sockets -p gni:                                  Fail

fi_poll -t queue -p gni:                                Fail

fi_poll -t counter -p gni:                              Fail

fi_rdm -p gni:                                          Fail

fi_rdm_rma_simple -p gni:                             Notrun

fi_rdm_rma_trigger -p gni:                            Notrun

fi_shared_ctx -p gni:                                 Notrun

fi_shared_ctx --no-tx-shared-ctx -p gni:              Notrun

fi_shared_ctx --no-rx-shared-ctx -p gni:                Fail

fi_shared_ctx -e msg -p gni:                          Notrun

fi_shared_ctx -e msg --no-tx-shared-ctx -p gni:       Notrun

fi_shared_ctx -e msg --no-rx-shared-ctx -p gni:         Fail

fi_shared_ctx -e dgram -p gni:                        Notrun

fi_shared_ctx -e dgram --no-tx-shared-ctx -p gni:     Notrun

fi_shared_ctx -e dgram --no-rx-shared-ctx -p gni:       Fail

fi_rdm_tagged_peek -p gni:                              Fail

fi_scalable_ep -p gni:                                  Fail

fi_cmatose -p gni:                                      Fail

fi_rdm_shared_av -p gni:                                Fail

fi_msg_pingpong -I 5 -p gni:                            Fail

fi_msg_bw -I 5 -p gni:                                  Fail

fi_rma_bw -e msg -o write -I 5 -p gni:                  Fail

fi_rma_bw -e msg -o read -I 5 -p gni:                   Fail

fi_rma_bw -e msg -o writedata -I 5 -p gni:              Fail

fi_rma_bw -e rdm -o write -I 5 -p gni:                  Fail

fi_rma_bw -e rdm -o read -I 5 -p gni:                   Fail

fi_rma_bw -e rdm -o writedata -I 5 -p gni:              Fail

fi_msg_rma -o write -I 5 -p gni:                        Fail

fi_msg_rma -o read -I 5 -p gni:                         Fail

fi_msg_rma -o writedata -I 5 -p gni:                    Fail

fi_msg_stream -I 5 -p gni:                              Fail

fi_rdm_atomic -I 5 -o all -p gni:                       Fail

fi_rdm_cntr_pingpong -I 5 -p gni:                       Fail

fi_rdm_multi_recv -I 5 -p gni:                          Fail

fi_rdm_pingpong -I 5 -p gni:                            Fail

fi_rdm_rma -o write -I 5 -p gni:                        Fail

fi_rdm_rma -o read -I 5 -p gni:                         Fail

fi_rdm_rma -o writedata -I 5 -p gni:                    Fail

fi_rdm_tagged_pingpong -I 5 -p gni:                     Fail

fi_rdm_tagged_bw -I 5 -p gni:                           Fail

fi_dgram_pingpong -I 5 -p gni:                          Fail

fi_rc_pingpong -n 5 -p gni:                             Fail

fi_rc_pingpong -n 5 -e -p gni:                          Fail

# --------------------------------------------------------------

# Total Pass                                                 8

# Total Notrun                                               8

# Total Fail                                                43

# Percentage of Pass                                        15

# --------------------------------------------------------------



# --------------------------------------------------------------

cray with omnipath and verbs

# --------------------------------------------------------------



$HOME/apps/fabtests/bin/runfabtests.sh -p $HOME/apps/fabtests/bin verbs 192.168.18.65 192.168.18.66



# Test                                                  Result

# --------------------------------------------------------------

fi_getinfo_test -p verbs:                               Pass

fi_av_test -g 192.168.10.1 -n 1 -s 192.168.18.65 -p verbs:      Pass

fi_dom_test -n 2 -p verbs:                              Pass

fi_eq_test -p verbs:                                    Pass

fi_cq_test -p verbs:                                    Pass

fi_mr_test -p verbs:                                    Pass

fi_size_left_test -p verbs:                             Pass

fi_dgram g00n13s -p verbs:                              Pass

fi_rdm g00n13s -p verbs:                                Pass

fi_msg g00n13s -p verbs:                                Pass

fi_cm_data -p verbs:                                    Pass

fi_cq_data -p verbs:                                    Pass

fi_dgram -p verbs:                                    Notrun

fi_dgram_waitset -p verbs:                            Notrun

fi_msg -p verbs:                                        Pass

fi_msg_epoll -p verbs:                                  Pass

fi_msg_sockets -p verbs:                                Pass

fi_poll -t queue -p verbs:                            Notrun

fi_poll -t counter -p verbs:                          Notrun

fi_rdm -p verbs:                                        Pass

fi_rdm_rma_simple -p verbs:                           Notrun

fi_rdm_rma_trigger -p verbs:                          Notrun

fi_shared_ctx -p verbs:                               Notrun

fi_shared_ctx --no-tx-shared-ctx -p verbs:            Notrun

fi_shared_ctx --no-rx-shared-ctx -p verbs:            Notrun

fi_shared_ctx -e msg -p verbs:                        Notrun

fi_shared_ctx -e msg --no-tx-shared-ctx -p verbs:       Pass

fi_shared_ctx -e msg --no-rx-shared-ctx -p verbs:     Notrun

fi_shared_ctx -e dgram -p verbs:                      Notrun

fi_shared_ctx -e dgram --no-tx-shared-ctx -p verbs:    Notrun

fi_shared_ctx -e dgram --no-rx-shared-ctx -p verbs:    Notrun

fi_rdm_tagged_peek -p verbs:                            Pass

fi_scalable_ep -p verbs:                              Notrun

fi_cmatose -p verbs:                                    Pass

fi_rdm_shared_av -p verbs:                            Notrun

fi_msg_pingpong -I 5 -p verbs:                          Pass

fi_msg_bw -I 5 -p verbs:                              Notrun

fi_rma_bw -e msg -o write -I 5 -p verbs:              Notrun

fi_rma_bw -e msg -o read -I 5 -p verbs:               Notrun

fi_rma_bw -e msg -o writedata -I 5 -p verbs:          Notrun

fi_rma_bw -e rdm -o write -I 5 -p verbs:              Notrun

fi_rma_bw -e rdm -o read -I 5 -p verbs:               Notrun

fi_rma_bw -e rdm -o writedata -I 5 -p verbs:          Notrun

fi_msg_rma -o write -I 5 -p verbs:                      Pass

fi_msg_rma -o read -I 5 -p verbs:                       Pass

fi_msg_rma -o writedata -I 5 -p verbs:                  Pass

fi_msg_stream -I 5 -p verbs:                            Pass

fi_rdm_atomic -I 5 -o all -p verbs:                   Notrun

fi_rdm_cntr_pingpong -I 5 -p verbs:                   Notrun

fi_rdm_multi_recv -I 5 -p verbs:                        Pass

fi_rdm_pingpong -I 5 -p verbs:                          Pass

fi_rdm_rma -o write -I 5 -p verbs:                    Notrun

fi_rdm_rma -o read -I 5 -p verbs:                     Notrun

fi_rdm_rma -o writedata -I 5 -p verbs:                Notrun

fi_rdm_tagged_pingpong -I 5 -p verbs:                   Pass

fi_rdm_tagged_bw -I 5 -p verbs:                         Pass

fi_dgram_pingpong -I 5 -p verbs:                      Notrun

fi_rc_pingpong -n 5 -p verbs:                           Pass

fi_rc_pingpong -n 5 -e -p verbs:                        Pass

# --------------------------------------------------------------

# Total Pass                                                30

# Total Notrun                                              29

# Total Fail                                                 0

# Percentage of Pass                                       100

# --------------------------------------------------------------



# --------------------------------------------------------------

# ping pong test from libfabric on gni

# --------------------------------------------------------------



./frun.sh /users/biddisco/apps/libfabric/bin/fi_pingpong

running /users/biddisco/apps/libfabric/bin/fi_pingpong on nid00[421,425]

nid00421 is 148.187.33.168

Generated command is  srun -n 2 --ntasks-per-node=1 -l --multi-prog ./scalable.conf

0 /users/biddisco/apps/libfabric/bin/fi_pingpong -p gni

1 /users/biddisco/apps/libfabric/bin/fi_pingpong -p gni nid00421



1: bytes   #sent   #ack     total       time     MB/sec    usec/xfer   Mxfers/sec

1: 64      10k     =10k     1.2m        0.05s     25.52       2.51       0.40

1: 256     10k     =10k     4.8m        0.06s     85.54       2.99       0.33

1: 1k      10k     =10k     19m         0.05s    454.41       2.25       0.44

1: 4k      10k     =10k     78m         0.07s   1254.25       3.27       0.31

1: 64k     1k      =1k      125m        0.04s   3304.97      19.83       0.05

1: 1m      100     =100     200m        0.05s   4422.23     237.12       0.00

0: bytes   #sent   #ack     total       time     MB/sec    usec/xfer   Mxfers/sec

0: 64      10k     =10k     1.2m        0.05s     25.52       2.51       0.40

0: 256     10k     =10k     4.8m        0.06s     85.53       2.99       0.33

0: 1k      10k     =10k     19m         0.05s    454.35       2.25       0.44

0: 4k      10k     =10k     78m         0.07s   1254.17       3.27       0.31

0: 64k     1k      =1k      125m        0.04s   3304.56      19.83       0.05

0: 1m      100     =100     200m        0.05s   4421.01     237.18       0.00
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20170213/ae1589d1/attachment.html>


More information about the Libfabric-users mailing list