[libfabric-users] Test results on some machines

Sung-Eun Choi sungeun at cray.com
Mon Feb 13 07:06:38 PST 2017


Hi John,

In order to launch the fabtests with the gni provider, you either need
to do it by hand or via CCM mode.  Please see our wiki for directions:

https://github.com/ofi-cray/libfabric-cray/wiki/Running-fabtests-with-the-GNI-provider

Results from head of our master (ofi-cray):

# --------------------------------------------------------------
# Total Pass                                                52
# Total Notrun                                              12
# Total Fail                                                 4
# Percentage of Pass                                        92
# --------------------------------------------------------------

Let us know if there's somewhere else we can put this info to make it
clearer to people who want to run fabtests with the gni provider.

-- Sung

On Mon, Feb 13, 2017 at 10:42:58AM +0000, Biddiscombe, John A. wrote:
> I’m slightly troubled by the results I’ve got on 3 machines, the verbs provider seems to work, though the number of not-run tests is disturbing. The gni provider seems terrible, but when I run fi_pingpong from the libfabric build (not one the fabtests), I can get it working on gni, but not with verbs.
> 
> 
> 
> Oddly the mem reg test on gni fails flat out (when run by hand on a compute node)
> 
> 
> 
> /users/biddisco/apps/fabtests/bin/fi_mr_test
> 
> Testing MR on fabric gni
> 
> Running mr_reg [Test fi_mr_reg for various buffer sizes]...FAIL: fi_mr_reg failed: ret=12 (Cannot allocate memory)
> 
> Running mr_regv [Test fi_mr_regv]...FAIL: fi_mr_regv failed: ret=12 (Cannot allocate memory)
> 
> Running mr_regattr [Test fi_mr_regattr]...FAIL: fi_mr_regattr failed: ret=22 (Invalid argument)
> 
> Summary: 3 tests failed
> 
> 
> 
> I’ve no idea how the fi_pingpong test manages to work (see end of email for putput) in light of that fail (I presume it uses the mem reg)
> 
> 
> 
> I was hoping that I’d find at least one test that worked on all 3 machines so that I’d have confidence that I could use it as a template to work from. It seems not to be so easy.
> 
> 
> 
> Can anyone shed light on these results and possibly give advice on how to improve the gni behaviour?
> 
> NB. if I enable extra output using –vvv, the basic problem with gni seems to be every fails happens due to
> 
>     fi_getinfo(): common/shared.c:454, ret=-61 (No data available)
> 
> or
> 
>     fi_mr_reg(): util/pingpong.c:1317, ret=-12 (Cannot allocate memory)
> 
> 
> 
> NB2. I tried the gni tests with nid00411 type names instead of ip addresses just in case, but it did not help.
> 
> 
> 
> Thanks
> 
> 
> 
> JB
> 
> 
> 
> In each case I’ve allocated a couple of nodes and found the correct ip addresses for the fabric)
> 
> 
> 
> # --------------------------------------------------------------
> 
> Generic cluster with infiniband support
> 
> # --------------------------------------------------------------
> 
> 
> 
> $HOME/apps/fabtests/bin/runfabtests.sh -p $HOME/apps/fabtests/bin verbs 192.168.3.36 192.168.3.38
> 
> 
> 
> # Test                                                  Result
> 
> # --------------------------------------------------------------
> 
> fi_getinfo_test -p verbs:                               Pass
> 
> fi_av_test -g 192.168.10.1 -n 1 -s 192.168.3.36 -p verbs:      Pass
> 
> fi_dom_test -n 2 -p verbs:                              Pass
> 
> fi_eq_test -p verbs:                                    Pass
> 
> fi_cq_test -p verbs:                                    Pass
> 
> fi_mr_test -p verbs:                                    Pass
> 
> fi_size_left_test -p verbs:                             Pass
> 
> fi_dgram g00n13s -p verbs:                              Pass
> 
> fi_rdm g00n13s -p verbs:                                Pass
> 
> fi_msg g00n13s -p verbs:                                Pass
> 
> fi_cm_data -p verbs:                                    Pass
> 
> fi_cq_data -p verbs:                                    Pass
> 
> fi_dgram -p verbs:                                    Notrun
> 
> fi_dgram_waitset -p verbs:                            Notrun
> 
> fi_msg -p verbs:                                        Pass
> 
> fi_msg_epoll -p verbs:                                  Pass
> 
> fi_msg_sockets -p verbs:                                Pass
> 
> fi_poll -t queue -p verbs:                            Notrun
> 
> fi_poll -t counter -p verbs:                          Notrun
> 
> fi_rdm -p verbs:                                        Pass
> 
> fi_rdm_rma_simple -p verbs:                           Notrun
> 
> fi_rdm_rma_trigger -p verbs:                          Notrun
> 
> fi_shared_ctx -p verbs:                               Notrun
> 
> fi_shared_ctx --no-tx-shared-ctx -p verbs:            Notrun
> 
> fi_shared_ctx --no-rx-shared-ctx -p verbs:            Notrun
> 
> fi_shared_ctx -e msg -p verbs:                        Notrun
> 
> fi_shared_ctx -e msg --no-tx-shared-ctx -p verbs:       Pass
> 
> fi_shared_ctx -e msg --no-rx-shared-ctx -p verbs:     Notrun
> 
> fi_shared_ctx -e dgram -p verbs:                      Notrun
> 
> fi_shared_ctx -e dgram --no-tx-shared-ctx -p verbs:    Notrun
> 
> fi_shared_ctx -e dgram --no-rx-shared-ctx -p verbs:    Notrun
> 
> fi_rdm_tagged_peek -p verbs:                            Pass
> 
> fi_scalable_ep -p verbs:                              Notrun
> 
> fi_cmatose -p verbs:                                    Pass
> 
> fi_rdm_shared_av -p verbs:                            Notrun
> 
> fi_msg_pingpong -I 5 -p verbs:                          Pass
> 
> fi_msg_bw -I 5 -p verbs:                              Notrun
> 
> fi_rma_bw -e msg -o write -I 5 -p verbs:              Notrun
> 
> fi_rma_bw -e msg -o read -I 5 -p verbs:               Notrun
> 
> fi_rma_bw -e msg -o writedata -I 5 -p verbs:          Notrun
> 
> fi_rma_bw -e rdm -o write -I 5 -p verbs:              Notrun
> 
> fi_rma_bw -e rdm -o read -I 5 -p verbs:               Notrun
> 
> fi_rma_bw -e rdm -o writedata -I 5 -p verbs:          Notrun
> 
> fi_msg_rma -o write -I 5 -p verbs:                      Pass
> 
> fi_msg_rma -o read -I 5 -p verbs:                       Pass
> 
> fi_msg_rma -o writedata -I 5 -p verbs:                  Pass
> 
> fi_msg_stream -I 5 -p verbs:                            Pass
> 
> fi_rdm_atomic -I 5 -o all -p verbs:                   Notrun
> 
> fi_rdm_cntr_pingpong -I 5 -p verbs:                   Notrun
> 
> fi_rdm_multi_recv -I 5 -p verbs:                        Pass
> 
> fi_rdm_pingpong -I 5 -p verbs:                          Pass
> 
> fi_rdm_rma -o write -I 5 -p verbs:                    Notrun
> 
> fi_rdm_rma -o read -I 5 -p verbs:                     Notrun
> 
> fi_rdm_rma -o writedata -I 5 -p verbs:                Notrun
> 
> fi_rdm_tagged_pingpong -I 5 -p verbs:                   Pass
> 
> fi_rdm_tagged_bw -I 5 -p verbs:                         Pass
> 
> fi_dgram_pingpong -I 5 -p verbs:                      Notrun
> 
> fi_rc_pingpong -n 5 -p verbs:                           Pass
> 
> fi_rc_pingpong -n 5 -e -p verbs:                        Pass
> 
> # --------------------------------------------------------------
> 
> # Total Pass                                                30
> 
> # Total Notrun                                              29
> 
> # Total Fail                                                 0
> 
> # Percentage of Pass                                       100
> 
> # --------------------------------------------------------------
> 
> 
> 
> # --------------------------------------------------------------
> 
> cray xc40 with gni
> 
> # --------------------------------------------------------------
> 
> $HOME/apps/fabtests/bin/runfabtests.sh -p $HOME/apps/fabtests/bin gni 148.187.33.168 148.187.33.172
> 
> 
> 
> # Test                                                  Result
> 
> # --------------------------------------------------------------
> 
> fi_getinfo_test -p gni:                                 Pass
> 
> fi_av_test -g 192.168.10.1 -n 1 -s 148.187.33.168 -p gni:      Pass
> 
> fi_dom_test -n 2 -p gni:                                Pass
> 
> fi_eq_test -p gni:                                      Pass
> 
> fi_cq_test -p gni:                                      Pass
> 
> fi_mr_test -p gni:                                      Fail
> 
> fi_size_left_test -p gni:                               Fail
> 
> fi_dgram g00n13s -p gni:                                Pass
> 
> fi_rdm g00n13s -p gni:                                  Pass
> 
> fi_msg g00n13s -p gni:                                  Pass
> 
> fi_cm_data -p gni:                                      Fail
> 
> fi_cq_data -p gni:                                      Fail
> 
> fi_dgram -p gni:                                        Fail
> 
> fi_dgram_waitset -p gni:                                Fail
> 
> fi_msg -p gni:                                          Fail
> 
> fi_msg_epoll -p gni:                                    Fail
> 
> fi_msg_sockets -p gni:                                  Fail
> 
> fi_poll -t queue -p gni:                                Fail
> 
> fi_poll -t counter -p gni:                              Fail
> 
> fi_rdm -p gni:                                          Fail
> 
> fi_rdm_rma_simple -p gni:                             Notrun
> 
> fi_rdm_rma_trigger -p gni:                            Notrun
> 
> fi_shared_ctx -p gni:                                 Notrun
> 
> fi_shared_ctx --no-tx-shared-ctx -p gni:              Notrun
> 
> fi_shared_ctx --no-rx-shared-ctx -p gni:                Fail
> 
> fi_shared_ctx -e msg -p gni:                          Notrun
> 
> fi_shared_ctx -e msg --no-tx-shared-ctx -p gni:       Notrun
> 
> fi_shared_ctx -e msg --no-rx-shared-ctx -p gni:         Fail
> 
> fi_shared_ctx -e dgram -p gni:                        Notrun
> 
> fi_shared_ctx -e dgram --no-tx-shared-ctx -p gni:     Notrun
> 
> fi_shared_ctx -e dgram --no-rx-shared-ctx -p gni:       Fail
> 
> fi_rdm_tagged_peek -p gni:                              Fail
> 
> fi_scalable_ep -p gni:                                  Fail
> 
> fi_cmatose -p gni:                                      Fail
> 
> fi_rdm_shared_av -p gni:                                Fail
> 
> fi_msg_pingpong -I 5 -p gni:                            Fail
> 
> fi_msg_bw -I 5 -p gni:                                  Fail
> 
> fi_rma_bw -e msg -o write -I 5 -p gni:                  Fail
> 
> fi_rma_bw -e msg -o read -I 5 -p gni:                   Fail
> 
> fi_rma_bw -e msg -o writedata -I 5 -p gni:              Fail
> 
> fi_rma_bw -e rdm -o write -I 5 -p gni:                  Fail
> 
> fi_rma_bw -e rdm -o read -I 5 -p gni:                   Fail
> 
> fi_rma_bw -e rdm -o writedata -I 5 -p gni:              Fail
> 
> fi_msg_rma -o write -I 5 -p gni:                        Fail
> 
> fi_msg_rma -o read -I 5 -p gni:                         Fail
> 
> fi_msg_rma -o writedata -I 5 -p gni:                    Fail
> 
> fi_msg_stream -I 5 -p gni:                              Fail
> 
> fi_rdm_atomic -I 5 -o all -p gni:                       Fail
> 
> fi_rdm_cntr_pingpong -I 5 -p gni:                       Fail
> 
> fi_rdm_multi_recv -I 5 -p gni:                          Fail
> 
> fi_rdm_pingpong -I 5 -p gni:                            Fail
> 
> fi_rdm_rma -o write -I 5 -p gni:                        Fail
> 
> fi_rdm_rma -o read -I 5 -p gni:                         Fail
> 
> fi_rdm_rma -o writedata -I 5 -p gni:                    Fail
> 
> fi_rdm_tagged_pingpong -I 5 -p gni:                     Fail
> 
> fi_rdm_tagged_bw -I 5 -p gni:                           Fail
> 
> fi_dgram_pingpong -I 5 -p gni:                          Fail
> 
> fi_rc_pingpong -n 5 -p gni:                             Fail
> 
> fi_rc_pingpong -n 5 -e -p gni:                          Fail
> 
> # --------------------------------------------------------------
> 
> # Total Pass                                                 8
> 
> # Total Notrun                                               8
> 
> # Total Fail                                                43
> 
> # Percentage of Pass                                        15
> 
> # --------------------------------------------------------------
> 
> 
> 
> # --------------------------------------------------------------
> 
> cray with omnipath and verbs
> 
> # --------------------------------------------------------------
> 
> 
> 
> $HOME/apps/fabtests/bin/runfabtests.sh -p $HOME/apps/fabtests/bin verbs 192.168.18.65 192.168.18.66
> 
> 
> 
> # Test                                                  Result
> 
> # --------------------------------------------------------------
> 
> fi_getinfo_test -p verbs:                               Pass
> 
> fi_av_test -g 192.168.10.1 -n 1 -s 192.168.18.65 -p verbs:      Pass
> 
> fi_dom_test -n 2 -p verbs:                              Pass
> 
> fi_eq_test -p verbs:                                    Pass
> 
> fi_cq_test -p verbs:                                    Pass
> 
> fi_mr_test -p verbs:                                    Pass
> 
> fi_size_left_test -p verbs:                             Pass
> 
> fi_dgram g00n13s -p verbs:                              Pass
> 
> fi_rdm g00n13s -p verbs:                                Pass
> 
> fi_msg g00n13s -p verbs:                                Pass
> 
> fi_cm_data -p verbs:                                    Pass
> 
> fi_cq_data -p verbs:                                    Pass
> 
> fi_dgram -p verbs:                                    Notrun
> 
> fi_dgram_waitset -p verbs:                            Notrun
> 
> fi_msg -p verbs:                                        Pass
> 
> fi_msg_epoll -p verbs:                                  Pass
> 
> fi_msg_sockets -p verbs:                                Pass
> 
> fi_poll -t queue -p verbs:                            Notrun
> 
> fi_poll -t counter -p verbs:                          Notrun
> 
> fi_rdm -p verbs:                                        Pass
> 
> fi_rdm_rma_simple -p verbs:                           Notrun
> 
> fi_rdm_rma_trigger -p verbs:                          Notrun
> 
> fi_shared_ctx -p verbs:                               Notrun
> 
> fi_shared_ctx --no-tx-shared-ctx -p verbs:            Notrun
> 
> fi_shared_ctx --no-rx-shared-ctx -p verbs:            Notrun
> 
> fi_shared_ctx -e msg -p verbs:                        Notrun
> 
> fi_shared_ctx -e msg --no-tx-shared-ctx -p verbs:       Pass
> 
> fi_shared_ctx -e msg --no-rx-shared-ctx -p verbs:     Notrun
> 
> fi_shared_ctx -e dgram -p verbs:                      Notrun
> 
> fi_shared_ctx -e dgram --no-tx-shared-ctx -p verbs:    Notrun
> 
> fi_shared_ctx -e dgram --no-rx-shared-ctx -p verbs:    Notrun
> 
> fi_rdm_tagged_peek -p verbs:                            Pass
> 
> fi_scalable_ep -p verbs:                              Notrun
> 
> fi_cmatose -p verbs:                                    Pass
> 
> fi_rdm_shared_av -p verbs:                            Notrun
> 
> fi_msg_pingpong -I 5 -p verbs:                          Pass
> 
> fi_msg_bw -I 5 -p verbs:                              Notrun
> 
> fi_rma_bw -e msg -o write -I 5 -p verbs:              Notrun
> 
> fi_rma_bw -e msg -o read -I 5 -p verbs:               Notrun
> 
> fi_rma_bw -e msg -o writedata -I 5 -p verbs:          Notrun
> 
> fi_rma_bw -e rdm -o write -I 5 -p verbs:              Notrun
> 
> fi_rma_bw -e rdm -o read -I 5 -p verbs:               Notrun
> 
> fi_rma_bw -e rdm -o writedata -I 5 -p verbs:          Notrun
> 
> fi_msg_rma -o write -I 5 -p verbs:                      Pass
> 
> fi_msg_rma -o read -I 5 -p verbs:                       Pass
> 
> fi_msg_rma -o writedata -I 5 -p verbs:                  Pass
> 
> fi_msg_stream -I 5 -p verbs:                            Pass
> 
> fi_rdm_atomic -I 5 -o all -p verbs:                   Notrun
> 
> fi_rdm_cntr_pingpong -I 5 -p verbs:                   Notrun
> 
> fi_rdm_multi_recv -I 5 -p verbs:                        Pass
> 
> fi_rdm_pingpong -I 5 -p verbs:                          Pass
> 
> fi_rdm_rma -o write -I 5 -p verbs:                    Notrun
> 
> fi_rdm_rma -o read -I 5 -p verbs:                     Notrun
> 
> fi_rdm_rma -o writedata -I 5 -p verbs:                Notrun
> 
> fi_rdm_tagged_pingpong -I 5 -p verbs:                   Pass
> 
> fi_rdm_tagged_bw -I 5 -p verbs:                         Pass
> 
> fi_dgram_pingpong -I 5 -p verbs:                      Notrun
> 
> fi_rc_pingpong -n 5 -p verbs:                           Pass
> 
> fi_rc_pingpong -n 5 -e -p verbs:                        Pass
> 
> # --------------------------------------------------------------
> 
> # Total Pass                                                30
> 
> # Total Notrun                                              29
> 
> # Total Fail                                                 0
> 
> # Percentage of Pass                                       100
> 
> # --------------------------------------------------------------
> 
> 
> 
> # --------------------------------------------------------------
> 
> # ping pong test from libfabric on gni
> 
> # --------------------------------------------------------------
> 
> 
> 
> ./frun.sh /users/biddisco/apps/libfabric/bin/fi_pingpong
> 
> running /users/biddisco/apps/libfabric/bin/fi_pingpong on nid00[421,425]
> 
> nid00421 is 148.187.33.168
> 
> Generated command is  srun -n 2 --ntasks-per-node=1 -l --multi-prog ./scalable.conf
> 
> 0 /users/biddisco/apps/libfabric/bin/fi_pingpong -p gni
> 
> 1 /users/biddisco/apps/libfabric/bin/fi_pingpong -p gni nid00421
> 
> 
> 
> 1: bytes   #sent   #ack     total       time     MB/sec    usec/xfer   Mxfers/sec
> 
> 1: 64      10k     =10k     1.2m        0.05s     25.52       2.51       0.40
> 
> 1: 256     10k     =10k     4.8m        0.06s     85.54       2.99       0.33
> 
> 1: 1k      10k     =10k     19m         0.05s    454.41       2.25       0.44
> 
> 1: 4k      10k     =10k     78m         0.07s   1254.25       3.27       0.31
> 
> 1: 64k     1k      =1k      125m        0.04s   3304.97      19.83       0.05
> 
> 1: 1m      100     =100     200m        0.05s   4422.23     237.12       0.00
> 
> 0: bytes   #sent   #ack     total       time     MB/sec    usec/xfer   Mxfers/sec
> 
> 0: 64      10k     =10k     1.2m        0.05s     25.52       2.51       0.40
> 
> 0: 256     10k     =10k     4.8m        0.06s     85.53       2.99       0.33
> 
> 0: 1k      10k     =10k     19m         0.05s    454.35       2.25       0.44
> 
> 0: 4k      10k     =10k     78m         0.07s   1254.17       3.27       0.31
> 
> 0: 64k     1k      =1k      125m        0.04s   3304.56      19.83       0.05
> 
> 0: 1m      100     =100     200m        0.05s   4421.01     237.18       0.00

> _______________________________________________
> Libfabric-users mailing list
> Libfabric-users at lists.openfabrics.org
> http://lists.openfabrics.org/mailman/listinfo/libfabric-users




More information about the Libfabric-users mailing list