[libfabric-users] GNI: "invalid argument" when more than one client on node
Latham, Robert J.
robl at mcs.anl.gov
Fri Sep 13 13:12:14 PDT 2019
On Fri, 2019-09-13 at 01:59 +0000, Latham, Robert J. via Libfabric-
users wrote:
> I have a distributed service using libfabric on our Cray that seems
> to
> work ok as long as it is just one client. If I have two servers and
> two clients, I get an error about invalid flags passed to
> gnix_mr_reg. The trace (last few lines of which I have included
> below)
> ends at this check for several parameters but I don't know which one
> was invalid (yet. I'll patch in some debugging and find out more in
> the morning)
OK, that check does return FI_EINVAL but that's not what's getting an
error. Instead, it's UDREG_CacheCreate returing
UDREG_RC_ERROR_RESOURCE . That's as far as I've gotten. More
information below.
Again, here's my logging (plus a little bit extra I added) before
fi_enable fails:
libfabric:179713:gni:ep_ctrl:_gnix_prog_obj_add():101<info> [179713:1]
Added obj(0xcc94a0) to set(0xd8e028)
libfabric:179713:gni:mr:_gnix_mr_reg():222<trace> [179713:1]
libfabric:179713:gni:mr:_gnix_mr_reg():224<info> [179713:1] reg:
buf=0x1e88ab0 len=8192
libfabric:179713:gni:mr:_gnix_mr_reg():226<info> [179713:1] reg:
buf=0x1e88ab0 len=8192 offset=0, mr_o=0x7fffffff2b58, access=0x300 fid-
>class: 2
libfabric:179713:gni:mr:__udreg_init():826<warn> [179713:1] Could not
initialize udreg application cache, urc=2
libfabric:179713:gni:ep_data:_gnix_ep_int_tx_pool_grow():112<warn>
[179713:1] gnix_mr_req returned: Invalid argument
Right, there's a call to __udreg_init before the error. 'urc' of 2
corresponds to UDREG_RC_ERROR_RESOURCE . I'm out of my depth
here. What resource might I be running out of?
Thanks
==rob
More information about the Libfabric-users
mailing list