[libfabric-users] gni provider and FI_REMOTE_CQ_DATA
D'Alessandro, Luke K
ldalessa at iu.edu
Thu Sep 3 16:07:11 PDT 2020
Hi All,
I have some test code that depends on FI_REMOTE_CQ_DATA which I’ve debugged using the UDP;ofi_rxd provider.
I am trying to run that code using the gni provider on an XC40 but I’m not ever seeing the remote CQ events there. Is there something special I need to set up to get remote CQ events with gni?
I request:
fi_info *hints = fi_allocinfo();
hints->caps = FI_RMA | FI_REMOTE_WRITE | FI_RMA_EVENT;
hints->mode = FI_CONTEXT | FI_CONTEXT2;
hints->domain_attr->mr_mode = FI_MR_BASIC;
hints->ep_attr->type = FI_EP_RDM;
hints->tx_attr->msg_order = FI_ORDER_WAW | FI_ORDER_RMA_WAW;
hints->rx_attr->msg_order = FI_ORDER_WAW | FI_ORDER_RMA_WAW;
hints->rx_attr->caps = FI_RMA | FI_REMOTE_WRITE | FI_RMA_EVENT;
And I successfully receive:
0: # Provider Fabric Domain Version EP_TYPE Protocol
0: # gni gni /sys/class/gni/kgni0 1.1 FI_EP_RDM FI_EP_RDM
0: # gni;ofi_rxd gni /sys/class/gni/kgni0 111.0 FI_EP_RDM FI_EP_RDM
...
1: # Provider Fabric Domain Version EP_TYPE Protocol
1: # gni gni /sys/class/gni/kgni0 1.1 FI_EP_RDM FI_EP_RDM
1: # gni;ofi_rxd gni /sys/class/gni/kgni0 111.0 FI_EP_RDM FI_EP_RDM
I move through sequence of initialization calls that seem to be standard from what I can tell, resulting in an endpoint that is enabled successfully.
static fi_context ep_ctx[2];
check(fi_endpoint, domain, info, &ep, ep_ctx);
check(fi_ep_bind, ep, &tx->fid, FI_TRANSMIT | FI_SELECTIVE_COMPLETION);
check(fi_ep_bind, ep, &rx->fid, FI_RECV);
check(fi_ep_bind, ep, &av->fid, 0);
check(fi_enable, ep);
Messages are sent with fi_writemsg and FI_REMOTE_CQ_DATA, and neither fail nor signal FI_EAGAIN (this is a little alarming as I have tx/rx size of 500 and I send more than that through the endpoint, I guess they just vanish into the ether).
int e = fi_writemsg(ep, &msg, FI_REMOTE_CQ_DATA);
if (likely(!e)) {
return true;
}
if (likely(e == -FI_EAGAIN)) {
return false;
}
fmt::print(stderr, "[{}] has unhandled tx error {}: {}\n", mpi::rank(), -e, fi_strerror(-e));
Unfortunately I never see any completions on the target rank (unlike UDP;ofi_rxd where things are fine).
Is there some magic that I need with gni to make FI_REMOTE_CQ_DATA work?
Thanks,
Luke
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20200903/795f9c03/attachment.htm>
More information about the Libfabric-users
mailing list