<div dir="ltr"><div class="gmail_default" style="font-size:large">Hello,<br><br></div><div class="gmail_default" style="font-size:large">My name is Stefan Oesterreich and I am the Systems Administrator of the UNH-IOL OFA cluster. The OFIWG would like to include running fabtest as part of our OFED and vendor device/firmware validation testing. I have very limited knowledge of fabtest, so I am looking for some guidance on a comprehensive test command. We test Infiniband, iWARP, and RoCE, and we are looking to test the verbs provider. The command I have thus far is as follows:<br><br>runfabtests.sh -t all -g $server_transport_ip_addr -s $server_transport_hostname -c $client_transport_hostname verbs $server_mgmt_hostname $client_mgmt_hostname<br><br></div><div class="gmail_default" style="font-size:large">Here is a filled in example:<br>runfabtests.sh -t all -g 10.1.0.3 -s titan-ib.ofa -c phoebe-ib.ofa verbs titan.ofa phoebe.ofa<br></div><div class="gmail_default" style="font-size:large"><br></div><div class="gmail_default" style="font-size:large">When I run the above command on one of my Infiniband nodes I get the following output:<br><br># Test Result<br># --------------------------------------------------------------<br>fi_getinfo_test -p "verbs": Pass<br>fi_av_test -g 10.1.0.3 -n 1 -s titan-ib.ofa -p "verbs": Pass<br>fi_dom_test -n 2 -p "verbs": Pass<br>fi_eq_test -p "verbs": Pass<br>fi_cq_test -p "verbs": Pass<br>fi_mr_test -p "verbs": Pass<br>fi_cntr_test -p "verbs": Pass<br>fi_dgram g00n13s -p "verbs": Pass<br>fi_rdm g00n13s -p "verbs": Pass<br>fi_msg g00n13s -p "verbs": Pass<br>fi_cm_data -p "verbs": Pass<br>fi_cq_data -p "verbs": Fail<br>fi_dgram -p "verbs": Notrun<br>fi_dgram_waitset -p "verbs": Notrun<br>fi_msg -p "verbs": Pass<br>fi_msg_epoll -p "verbs": Pass<br>fi_msg_sockets -p "verbs": Pass<br>fi_poll -t queue -p "verbs": Notrun<br>fi_poll -t counter -p "verbs": Notrun<br>fi_rdm -p "verbs": Pass<br>fi_rdm_rma_simple -p "verbs": Notrun<br>fi_rdm_rma_trigger -p "verbs": Notrun<br>fi_shared_ctx -p "verbs": Notrun<br>fi_shared_ctx --no-tx-shared-ctx -p "verbs": Notrun<br>fi_shared_ctx --no-rx-shared-ctx -p "verbs": Notrun<br>fi_shared_ctx -e msg -p "verbs": Notrun<br>fi_shared_ctx -e msg --no-tx-shared-ctx -p "verbs": Pass<br>fi_shared_ctx -e msg --no-rx-shared-ctx -p "verbs": Notrun<br>fi_shared_ctx -e dgram -p "verbs": Notrun<br>fi_shared_ctx -e dgram --no-tx-shared-ctx -p "verbs": Notrun<br>fi_shared_ctx -e dgram --no-rx-shared-ctx -p "verbs": Notrun<br>fi_rdm_tagged_peek -p "verbs": Pass<br>fi_scalable_ep -p "verbs": Notrun<br>fi_cmatose -p "verbs": Pass<br>fi_rdm_shared_av -p "verbs": Notrun<br>fi_multi_mr -e msg -V -p "verbs": Notrun<br>fi_multi_mr -e rdm -V -p "verbs": Notrun<br>fi_recv_cancel -e rdm -V -p "verbs": Notrun<br>fi_unexpected_msg -e msg -i 10 -p "verbs": Notrun<br>fi_unexpected_msg -e rdm -i 10 -p "verbs": Notrun<br>fi_unexpected_msg -e dgram -i 10 -p "verbs": Notrun<br>fi_unexpected_msg -e msg -S -i 10 -p "verbs": Notrun<br>fi_unexpected_msg -e rdm -S -i 10 -p "verbs": Notrun<br>fi_unexpected_msg -e dgram -S -i 10 -p "verbs": Notrun<br>fi_msg_pingpong -p "verbs": Pass<br>fi_msg_pingpong -v -p "verbs": Pass<br>fi_msg_pingpong -k -p "verbs": Notrun<br>fi_msg_pingpong -k -v -p "verbs": Notrun<br>fi_msg_bw -p "verbs": Pass<br>fi_msg_bw -v -p "verbs": Pass<br>fi_rma_bw -e msg -o write -p "verbs": Pass<br>fi_rma_bw -e msg -o read -p "verbs": Pass<br>fi_rma_bw -e msg -o writedata -p "verbs": Pass<br>fi_rma_bw -e rdm -o write -p "verbs": Pass<br>fi_rma_bw -e rdm -o read -p "verbs": Pass<br>fi_rma_bw -e rdm -o writedata -p "verbs": Fail<br>fi_msg_rma -o write -p "verbs": Pass<br>fi_msg_rma -o read -p "verbs": Pass<br>fi_msg_rma -o writedata -p "verbs": Pass<br>fi_msg_stream -p "verbs": Pass<br>fi_rdm_atomic -o all -I 1000 -p "verbs": Notrun<br>fi_rdm_cntr_pingpong -p "verbs": Notrun<br>fi_rdm_multi_recv -p "verbs": Fail<br>fi_rdm_pingpong -p "verbs": Pass<br>fi_rdm_pingpong -v -p "verbs": Pass<br>fi_rdm_pingpong -k -p "verbs": Notrun<br>fi_rdm_pingpong -k -v -p "verbs": Notrun<br>fi_rdm_rma -o write -p "verbs": Fail<br>fi_rdm_rma -o read -p "verbs": Fail<br>fi_rdm_rma -o writedata -p "verbs": Fail<br>fi_rdm_tagged_pingpong -p "verbs": Pass<br>fi_rdm_tagged_pingpong -v -p "verbs": Pass<br>fi_rdm_tagged_bw -p "verbs": Pass<br>fi_rdm_tagged_bw -v -p "verbs": Pass<br>fi_dgram_pingpong -p "verbs": Notrun<br>fi_dgram_pingpong -k -p "verbs": Notrun<br>fi_rc_pingpong -p "verbs": Pass<br>fi_ubertest: Server returns 124, client returns 124<br>fi_ubertest: Fail [/]<br># --------------------------------------------------------------<br># Total Pass 38<br># Total Notrun 33<br># Total Fail 7<br># Percentage of Pass 84<br># --------------------------------------------------------------<br><br><br></div><div class="gmail_default" style="font-size:large">My questions are:<br><ul><li>Is the above command comprehensive enough for all 3 transports (IB, IW, RoCE)?</li><li>What test mode should I be using (all,quick,unit,simple,standard,short,complex)? This is the first time running through this testing, so I don't know if "all" is appropriate here. Time is also a consideration here, It seems to take about 13 minutes to complete one server-client pair, and we have 6 nodes, so there are quite a few permutations. <br></li><li>What makes a test result "Notrun" vs "Fail"? When I use -vv to see output, I am seeing a lot of "fi_getinfo(): common/shared.c:540, ret=-61 (No data available)" and "fi_poll_open(): simple/poll.c:55, ret=-38 (Function not implemented)", is this normal?</li><li>I am also seeing a lot of "Killed by signal 15", which I believe means that the timeout was hit and the run was killed. Should I be increasing my timeout? I would expect the default timeout to be good enough, but I am unsure.<br></li><li>As you can see from the output above, there are a few fails. Does this indicate a bug in fabtests or OFED/vendors drivers or simply that I am not running the correct fabtest command?</li></ul><p>Thanks in advance, I really appreciate any assistance that you guys can provide.<br></p></div><div class="gmail_default" style="font-size:large"></div>-- <br><div class="gmail-m_-3073522824143776680gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><b>------------------------------<wbr>-----------------<br>Cheers,<br>Stefan Oesterreich</b><div><b>High Performance Computing</b></div><div><b>UNH InterOperability Laboratory<br>------------------------------<wbr>------------------<br></b></div></div></div></div></div></div></div></div></div>
</div>