[libfabric-users] Fabtest questions

Stefan Oesterreich soesterreich at iol.unh.edu
Thu Mar 29 10:20:27 PDT 2018


Hello,

My name is Stefan Oesterreich and I am the Systems Administrator of the
UNH-IOL OFA cluster. The OFIWG would like to include running fabtest as
part of our OFED and vendor device/firmware validation testing. I have very
limited knowledge of fabtest, so I am looking for some guidance on a
comprehensive test command. We test Infiniband, iWARP, and RoCE, and we are
looking to test the verbs provider. The command I have thus far is as
follows:

runfabtests.sh -t all -g $server_transport_ip_addr -s
$server_transport_hostname -c $client_transport_hostname verbs
$server_mgmt_hostname $client_mgmt_hostname

Here is a filled in example:
runfabtests.sh -t all -g 10.1.0.3 -s titan-ib.ofa -c phoebe-ib.ofa verbs
titan.ofa phoebe.ofa

When I run the above command on one of my Infiniband nodes I get the
following output:

# Test                                                  Result
# --------------------------------------------------------------
fi_getinfo_test -p "verbs":                             Pass
fi_av_test -g 10.1.0.3 -n 1 -s titan-ib.ofa -p "verbs":      Pass
fi_dom_test -n 2 -p "verbs":                            Pass
fi_eq_test -p "verbs":                                  Pass
fi_cq_test -p "verbs":                                  Pass
fi_mr_test -p "verbs":                                  Pass
fi_cntr_test -p "verbs":                                Pass
fi_dgram g00n13s -p "verbs":                            Pass
fi_rdm g00n13s -p "verbs":                              Pass
fi_msg g00n13s -p "verbs":                              Pass
fi_cm_data -p "verbs":                                  Pass
fi_cq_data -p "verbs":                                  Fail
fi_dgram -p "verbs":                                  Notrun
fi_dgram_waitset -p "verbs":                          Notrun
fi_msg -p "verbs":                                      Pass
fi_msg_epoll -p "verbs":                                Pass
fi_msg_sockets -p "verbs":                              Pass
fi_poll -t queue -p "verbs":                          Notrun
fi_poll -t counter -p "verbs":                        Notrun
fi_rdm -p "verbs":                                      Pass
fi_rdm_rma_simple -p "verbs":                         Notrun
fi_rdm_rma_trigger -p "verbs":                        Notrun
fi_shared_ctx -p "verbs":                             Notrun
fi_shared_ctx --no-tx-shared-ctx -p "verbs":          Notrun
fi_shared_ctx --no-rx-shared-ctx -p "verbs":          Notrun
fi_shared_ctx -e msg -p "verbs":                      Notrun
fi_shared_ctx -e msg --no-tx-shared-ctx -p "verbs":      Pass
fi_shared_ctx -e msg --no-rx-shared-ctx -p "verbs":    Notrun
fi_shared_ctx -e dgram -p "verbs":                    Notrun
fi_shared_ctx -e dgram --no-tx-shared-ctx -p "verbs":    Notrun
fi_shared_ctx -e dgram --no-rx-shared-ctx -p "verbs":    Notrun
fi_rdm_tagged_peek -p "verbs":                          Pass
fi_scalable_ep -p "verbs":                            Notrun
fi_cmatose -p "verbs":                                  Pass
fi_rdm_shared_av -p "verbs":                          Notrun
fi_multi_mr -e msg -V -p "verbs":                     Notrun
fi_multi_mr -e rdm -V -p "verbs":                     Notrun
fi_recv_cancel -e rdm -V -p "verbs":                  Notrun
fi_unexpected_msg -e msg -i 10 -p "verbs":            Notrun
fi_unexpected_msg -e rdm -i 10 -p "verbs":            Notrun
fi_unexpected_msg -e dgram -i 10 -p "verbs":          Notrun
fi_unexpected_msg -e msg -S -i 10 -p "verbs":         Notrun
fi_unexpected_msg -e rdm -S -i 10 -p "verbs":         Notrun
fi_unexpected_msg -e dgram -S -i 10 -p "verbs":       Notrun
fi_msg_pingpong -p "verbs":                             Pass
fi_msg_pingpong -v -p "verbs":                          Pass
fi_msg_pingpong -k -p "verbs":                        Notrun
fi_msg_pingpong -k -v -p "verbs":                     Notrun
fi_msg_bw -p "verbs":                                   Pass
fi_msg_bw -v -p "verbs":                                Pass
fi_rma_bw -e msg -o write -p "verbs":                   Pass
fi_rma_bw -e msg -o read -p "verbs":                    Pass
fi_rma_bw -e msg -o writedata -p "verbs":               Pass
fi_rma_bw -e rdm -o write -p "verbs":                   Pass
fi_rma_bw -e rdm -o read -p "verbs":                    Pass
fi_rma_bw -e rdm -o writedata -p "verbs":               Fail
fi_msg_rma -o write -p "verbs":                         Pass
fi_msg_rma -o read -p "verbs":                          Pass
fi_msg_rma -o writedata -p "verbs":                     Pass
fi_msg_stream -p "verbs":                               Pass
fi_rdm_atomic -o all -I 1000 -p "verbs":              Notrun
fi_rdm_cntr_pingpong -p "verbs":                      Notrun
fi_rdm_multi_recv -p "verbs":                           Fail
fi_rdm_pingpong -p "verbs":                             Pass
fi_rdm_pingpong -v -p "verbs":                          Pass
fi_rdm_pingpong -k -p "verbs":                        Notrun
fi_rdm_pingpong -k -v -p "verbs":                     Notrun
fi_rdm_rma -o write -p "verbs":                         Fail
fi_rdm_rma -o read -p "verbs":                          Fail
fi_rdm_rma -o writedata -p "verbs":                     Fail
fi_rdm_tagged_pingpong -p "verbs":                      Pass
fi_rdm_tagged_pingpong -v -p "verbs":                   Pass
fi_rdm_tagged_bw -p "verbs":                            Pass
fi_rdm_tagged_bw -v -p "verbs":                         Pass
fi_dgram_pingpong -p "verbs":                         Notrun
fi_dgram_pingpong -k -p "verbs":                      Notrun
fi_rc_pingpong -p "verbs":                              Pass
fi_ubertest:                                      Server returns 124,
client returns 124
fi_ubertest:                                        Fail [/]
# --------------------------------------------------------------
# Total Pass                                                38
# Total Notrun                                              33
# Total Fail                                                 7
# Percentage of Pass                                        84
# --------------------------------------------------------------


My questions are:

   - Is the above command comprehensive enough for all 3 transports (IB,
   IW, RoCE)?
   - What test mode should I be using
   (all,quick,unit,simple,standard,short,complex)? This is the first time
   running through this testing, so I don't know if "all" is appropriate here.
   Time is also a consideration here, It seems to take about 13 minutes to
   complete one server-client pair, and we have 6 nodes, so there are quite a
   few permutations.
   - What makes a test result "Notrun" vs "Fail"? When I use -vv to see
   output, I am seeing a lot of "fi_getinfo(): common/shared.c:540, ret=-61
   (No data available)" and "fi_poll_open(): simple/poll.c:55, ret=-38
   (Function not implemented)", is this normal?
   - I am also seeing a lot of "Killed by signal 15", which I believe means
   that the timeout was hit and the run was killed. Should I be increasing my
   timeout? I would expect the default timeout to be good enough, but I am
   unsure.
   - As you can see from the output above, there are a few fails. Does this
   indicate a bug in fabtests or OFED/vendors drivers or simply that I am not
   running the correct fabtest command?

Thanks in advance, I really appreciate any assistance that you guys can
provide.
-- 


*-----------------------------------------------Cheers,Stefan Oesterreich*
*High Performance Computing*


*UNH InterOperability
Laboratory------------------------------------------------*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20180329/1d95116d/attachment.html>


More information about the Libfabric-users mailing list