[libfabric-users] GNI: "invalid argument" when more than one client on node

Latham, Robert J. robl at mcs.anl.gov
Thu Sep 12 18:59:13 PDT 2019


I have a distributed service using libfabric on our Cray that seems to
work ok as long as it is just one client.   If I have two servers and
two clients, I get an error about invalid flags passed to
gnix_mr_reg.  The trace (last few lines of which I have included below)
ends at this check for several parameters but I don't know which one
was invalid (yet.  I'll patch in some debugging and find out more in
the morning)


https://github.com/ofiwg/libfabric/blob/master/prov/gni/src/gnix_mr.c#L230

We got into this path from a call to 'fi_enable', which only takes one
argument and doesn't seem like the kind of routine I can call
incorrectly.

Any suggestions what I'm doing wrong here?

libfabric:134822:gni:fabric:_gnix_resolve_gni_ep_name():120<trace>
[134822:1] 
libfabric:134822:gni:ep_ctrl:_gnix_cm_nic_alloc():628<info> [134822:1]
creating cm_nic for 219/0x44710000/15360001
libfabric:134822:gni:ep_ctrl:gnix_nic_alloc():954<trace> [134822:1] 
libfabric:149618:gni:ep_ctrl:gnix_ep_bind():1813<trace> [149618:1] 
libfabric:149618:gni:ep_ctrl:gnix_ep_bind():1813<trace> [149618:1] 
libfabric:149618:gni:ep_ctrl:gnix_ep_control():1529<trace> [149618:1] 
libfabric:149618:gni:ep_ctrl:_gnix_vc_cm_init():2217<trace> [149618:1] 
libfabric:149618:gni:ep_ctrl:_gnix_cm_nic_reg_recv_fn():505<trace>
[149618:1] 
libfabric:149618:gni:ep_ctrl:_gnix_cm_nic_enable():523<trace>
[149618:1] 
libfabric:149618:gni:ep_ctrl:_gnix_dgram_alloc():244<trace> [149618:1] 
libfabric:149618:gni:ep_ctrl:_gnix_dgram_wc_post():312<trace>
[149618:1] 
libfabric:149618:gni:ep_ctrl:_gnix_dgram_alloc():244<trace> [149618:1] 
libfabric:149618:gni:ep_ctrl:_gnix_dgram_wc_post():312<trace>
[149618:1] 
libfabric:149618:gni:ep_ctrl:_gnix_dgram_alloc():244<trace> [149618:1] 
libfabric:149618:gni:ep_ctrl:_gnix_dgram_wc_post():312<trace>
[149618:1] 
libfabric:149618:gni:ep_ctrl:_gnix_dgram_alloc():244<trace> [149618:1] 
libfabric:149618:gni:ep_ctrl:_gnix_dgram_wc_post():312<trace>
[149618:1] 
libfabric:149618:gni:ep_ctrl:_gnix_prog_obj_add():101<info> [149618:1]
Added obj(0xbe4170) to set(0xbb91c8)
libfabric:149618:gni:ep_ctrl:_gnix_prog_obj_add():101<info> [149618:1]
Added obj(0xbc1e40) to set(0xbb91c8)
libfabric:149618:gni:mr:_gnix_mr_reg():222<trace> [149618:1] 
libfabric:149618:gni:mr:_gnix_mr_reg():224<info> [149618:1] reg:
buf=0x1e83620 len=8192
libfabric:149618:gni:mr:__udreg_init():824<warn> [149618:1] Could not
initialize udreg application cache, urc=2
libfabric:149618:gni:ep_data:_gnix_ep_int_tx_pool_grow():112<warn>
[149618:1] gnix_mr_req returned: Invalid argument




More information about the Libfabric-users mailing list