[libfabric-users] GNI: "invalid argument" when more than one client on node

Latham, Robert J. robl at mcs.anl.gov
Fri Sep 13 13:12:14 PDT 2019


On Fri, 2019-09-13 at 01:59 +0000, Latham, Robert J. via Libfabric-
users wrote:
> I have a distributed service using libfabric on our Cray that seems
> to
> work ok as long as it is just one client.   If I have two servers and
> two clients, I get an error about invalid flags passed to
> gnix_mr_reg.  The trace (last few lines of which I have included
> below)
> ends at this check for several parameters but I don't know which one
> was invalid (yet.  I'll patch in some debugging and find out more in
> the morning)

OK, that check does return FI_EINVAL but that's not what's getting an
error.  Instead, it's UDREG_CacheCreate returing
UDREG_RC_ERROR_RESOURCE .  That's as far as I've gotten.  More
information below.

Again, here's my logging (plus a little bit extra I added) before
fi_enable fails:

libfabric:179713:gni:ep_ctrl:_gnix_prog_obj_add():101<info> [179713:1]
Added obj(0xcc94a0) to set(0xd8e028)
libfabric:179713:gni:mr:_gnix_mr_reg():222<trace> [179713:1] 
libfabric:179713:gni:mr:_gnix_mr_reg():224<info> [179713:1] reg:
buf=0x1e88ab0 len=8192
libfabric:179713:gni:mr:_gnix_mr_reg():226<info> [179713:1] reg:
buf=0x1e88ab0 len=8192 offset=0, mr_o=0x7fffffff2b58, access=0x300 fid-
>class: 2
libfabric:179713:gni:mr:__udreg_init():826<warn> [179713:1] Could not
initialize udreg application cache, urc=2
libfabric:179713:gni:ep_data:_gnix_ep_int_tx_pool_grow():112<warn>
[179713:1] gnix_mr_req returned: Invalid argument

Right, there's a call to __udreg_init before the error.  'urc' of 2
corresponds to UDREG_RC_ERROR_RESOURCE .  I'm out of my depth
here.   What resource might I be running out of?

Thanks
==rob



More information about the Libfabric-users mailing list