[libfabric-users] GNI provider

Sung-Eun Choi sungeun at cray.com
Fri Mar 18 16:20:39 PDT 2016


Hi Greg,

Thanks for you mail.  The launching mechanism used on Cori and other
XCs is not compatible with the way most of the fabtests need to be
started.  That said, we're working on getting the gni provider working
in Cray Cluster Mode, which should make it easier to run the fabtests
(hopefully almost a no op).

In the meantime, there are a couple of options.  We have a large
number of unit tests in our libfabric directory under prov/gni/test.
There's a script there called run_gnitest that will do the right thing
for slurm and aprun, though you have to get yourself a single node
allocation.  Note that these tests are not distributed in the release.
As another option, you can also use our forked version of fabtests
(https://github.com/ofi-cray/fabtests-cray), which includes a script
called cray_runall.sh.  This runs as many of the original fabtests as
possible (using srun or aprun) and, if you build with PMI, a few
extras.  There are also a couple files in the scripts directory that
list the expected and intermittent failures
(cray_runall_expected_failures and cray_runall_intermitten_failures).
Again, you'll have to obtain your own allocation for this script.

While we're on the topic, how could we have saved you the trouble of
trying all this stuff?  Documentation in the man page?  Info on our
wiki?  Something else?

Thanks.

-- Sung

On Fri, Mar 18, 2016 at 10:52:38PM +0000, Eisenhauer, Greg S wrote:
> Hi Folks,
> 
> I was trying to work with libfabric on Cori with the gni provider and have run into an early roadblock just trying to exercise runfabtests.sh in order to make sure everything was working. Have tried both v1.2 and current GIT HEAD for libfabric and fabtests.  Am I missing something basic?  A requirement to set SERVER, CLIENT or GOOD_ADDR?    Output from fi_info, trying to run an individual test manually (other tests show similar output), and output from ‘runfabtests.sh gni’ are included below.  I may be doing something stupid, but browsing various wikis, email lists and other resources for a while hasn’t produced any answers, so I thought I’d ask here.
> 
> thanks,
> greg
> 
> 
> 
> eisen at nid00050:~/fabtests-1.2.0/scripts> fi_info
> gni: gni
>     version: 1.0
>     type: FI_EP_RDM
>     protocol: FI_PROTO_GNI
> UDP: UDP-IP
>     version: 1.0
>     type: FI_EP_DGRAM
>     protocol: FI_PROTO_UDP
> sockets: IP
>     version: 1.0
>     type: FI_EP_MSG
>     protocol: FI_PROTO_SOCK_TCP
> sockets: IP
>     version: 1.0
>     type: FI_EP_DGRAM
>     protocol: FI_PROTO_SOCK_TCP
> sockets: IP
>     version: 1.0
>     type: FI_EP_RDM
>     protocol: FI_PROTO_SOCK_TCP
> eisen at nid00050:~/fabtests-1.2.0/scripts> fi_msg_rma -o write -I 5 -f gni
> fi_getinfo(): common/shared.c:340, ret=-61 (No data available)
> eisen at nid00050:~/fabtests-1.2.0/scripts> ./runfabtests.sh gni
> # Test                                                  Result
> # --------------------------------------------------------------
> fi_av_test -d 192.168.10.1 -n 1 -s 127.0.0.1 -f gni:    Notrun
> fi_dom_test -n 2 -f gni:                                Pass
> fi_eq_test -f gni:                                      Fail
> fi_size_left_test -f gni:                               Fail
> fi_cq_data -f gni:                                    Notrun
> fi_dgram -f gni:                                      Notrun
> fi_dgram_waitset -f gni:                              Notrun
> fi_msg -f gni:                                        Notrun
> fi_msg_epoll -f gni:                                  Notrun
> fi_msg_sockets -f gni:                                Notrun
> fi_poll -f gni:                                       Notrun
> fi_rdm -f gni:                                        Notrun
> fi_rdm_rma_simple -f gni:                             Notrun
> fi_rdm_rma_trigger -f gni:                            Notrun
> fi_rdm_shared_ctx -f gni:                             Notrun
> fi_rdm_tagged_peek -f gni:                            Notrun
> fi_scalable_ep -f gni:                                Notrun
> fi_cmatose -f gni:                                    Notrun
> fi_msg_pingpong -I 5 -f gni:                          Notrun
> fi_msg_rma -o write -I 5 -f gni:                      Notrun
> fi_msg_rma -o read -I 5 -f gni:                       Notrun
> fi_msg_rma -o writedata -I 5 -f gni:                  Notrun
> fi_rdm_atomic -I 5 -o all -f gni:                     Notrun
> fi_rdm_cntr_pingpong -I 5 -f gni:                     Notrun
> fi_rdm_inject_pingpong -I 5 -f gni:                   Notrun
> fi_rdm_multi_recv -I 5 -f gni:                        Notrun
> fi_rdm_pingpong -I 5 -f gni:                          Notrun
> fi_rdm_rma -o write -I 5 -f gni:                      Notrun
> fi_rdm_rma -o read -I 5 -f gni:                       Notrun
> fi_rdm_rma -o writedata -I 5 -f gni:                  Notrun
> fi_rdm_tagged_pingpong -I 5 -f gni:                   Notrun
> fi_ud_pingpong -I 5 -f gni:                           Notrun
> fi_rc_pingpong -n 5 -f gni:                           Notrun
> fi_rc_pingpong -n 5 -e -f gni:                        Notrun
> # --------------------------------------------------------------
> # Total Pass                                                 1
> # Total Notrun                                              31
> # Total Fail                                                 2
> # Percentage of Pass                                        33
> # --------------------------------------------------------------
> eisen at nid00050:~/fabtests-1.2.0/scripts>
> 
> 
> --------------------------
> Greg Eisenhauer
> eisen at cc.gatech.edu<mailto:eisen at cc.gatech.edu>
> 

> _______________________________________________
> Libfabric-users mailing list
> Libfabric-users at lists.openfabrics.org
> http://lists.openfabrics.org/mailman/listinfo/libfabric-users




More information about the Libfabric-users mailing list