<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
So I just discovered in the log:
<div class=""><br class="">
</div>
<div class="">
<blockquote type="cite" class="">libfabric:31689:gni:ep_data:gnix_ops_allowed():887<debug> [31689:2] flags:0x2220204, FI_REMOTE_CQ_DATA, FI_FENCE, FI_INJECT<br class="">
libfabric:31689:gni:ep_data:gnix_ops_allowed():889<debug> [31689:2] peer_caps:0x118000000312004, FI_MULTI_RECV, FI_TRIGGER, FI_FENCE<br class="">
libfabric:31689:gni:ep_data:gnix_ops_allowed():891<debug> [31689:2] caps:0x118000000312004, FI_RMA, FI_REMOTE_WRITE, FI_MULTI_RECV, FI_TRIGGER, FI_FENCE, FI_LOCAL_COMM, FI_REMOTE_COMM, FI_RMA_EVENT<br class="">
libfabric:31689:gni:cq:_gnix_cq_add_error():325<info> [31689:2] creating error event entry<br class="">
</blockquote>
<div><br class="">
</div>
<div>And some hunting in a debug build shows me that I’m failing at <a href="https://github.com/ofiwg/libfabric/blob/master/prov/gni/src/gnix_rma.c#L1224" class="">https://github.com/ofiwg/libfabric/blob/master/prov/gni/src/gnix_rma.c#L1224</a>. </div>
<div><br class="">
</div>
<div>I guess that I haven’t set up the endpoint/cq appropriately, so I’ll keep poking at that to see where I have gone wrong.</div>
<div><br class="">
</div>
<div>Thanks,</div>
<div>Luke</div>
<div><br class="">
<blockquote type="cite" class="">
<div class="">On Sep 3, 2020, at 4:07 PM, D'Alessandro, Luke K <<a href="mailto:ldalessa@iu.edu" class="">ldalessa@iu.edu</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
<div style="font-family: Arial, Helvetica, sans-serif;font-size: 12px;background-color: #ffece5;color: #82270d;border-left: .25rem solid #df3603;padding: .5rem;position: relative;text-align: left;line-height: 1.25;" class="">
This message was sent from a non-IU address. Please exercise caution when clicking links or opening attachments from external sources.</div>
<br class="">
<div class="">Hi All,</div>
<div class=""><br class="">
</div>
I have some test code that depends on FI_REMOTE_CQ_DATA which I’ve debugged using the UDP;ofi_rxd provider.
<div class=""><br class="">
</div>
<div class="">I am trying to run that code using the gni provider on an XC40 but I’m not ever seeing the remote CQ events there. Is there something special I need to set up to get remote CQ events with gni?
<div class=""><br class="">
</div>
<div class="">I request:</div>
<div class=""><br class="">
</div>
<div class="">
<blockquote type="cite" class="">fi_info *hints = fi_allocinfo();<br class="">
hints->caps = FI_RMA | FI_REMOTE_WRITE | FI_RMA_EVENT;<br class="">
hints->mode = FI_CONTEXT | FI_CONTEXT2;<br class="">
hints->domain_attr->mr_mode = FI_MR_BASIC;<br class="">
hints->ep_attr->type = FI_EP_RDM;<br class="">
hints->tx_attr->msg_order = FI_ORDER_WAW | FI_ORDER_RMA_WAW;<br class="">
hints->rx_attr->msg_order = FI_ORDER_WAW | FI_ORDER_RMA_WAW;<br class="">
hints->rx_attr->caps = FI_RMA | FI_REMOTE_WRITE | FI_RMA_EVENT;</blockquote>
<br class="">
</div>
<div class="">And I successfully receive:</div>
<div class=""><br class="">
</div>
<div class=""></div>
<blockquote type="cite" class="">
<div class="">0: # Provider Fabric Domain Version EP_TYPE Protocol<br class="">
0: # gni gni /sys/class/gni/kgni0 1.1 FI_EP_RDM FI_EP_RDM<br class="">
0: # gni;ofi_rxd gni /sys/class/gni/kgni0 111.0 FI_EP_RDM FI_EP_RDM</div>
</blockquote>
<blockquote type="cite" class="">...<br class="">
</blockquote>
<blockquote type="cite" class="">
<div class="">1: # Provider Fabric Domain Version EP_TYPE Protocol<br class="">
1: # gni gni /sys/class/gni/kgni0 1.1 FI_EP_RDM FI_EP_RDM<br class="">
1: # gni;ofi_rxd gni /sys/class/gni/kgni0 111.0 FI_EP_RDM FI_EP_RDM</div>
</blockquote>
<br class="">
<div class="">I move through sequence of initialization calls that seem to be standard from what I can tell, resulting in an endpoint that is enabled successfully.</div>
<div class=""><br class="">
</div>
<div class="">
<blockquote type="cite" class="">static fi_context ep_ctx[2];<br class="">
check(fi_endpoint, domain, info, &ep, ep_ctx);<br class="">
check(fi_ep_bind, ep, &tx->fid, FI_TRANSMIT | FI_SELECTIVE_COMPLETION);<br class="">
check(fi_ep_bind, ep, &rx->fid, FI_RECV);<br class="">
check(fi_ep_bind, ep, &av->fid, 0);<br class="">
check(fi_enable, ep);</blockquote>
<br class="">
</div>
<div class="">Messages are sent with fi_writemsg and FI_REMOTE_CQ_DATA, and neither fail nor signal <span style="caret-color: rgb(0, 0, 0);" class="">FI_EAGAIN (this is a little alarming as I have tx/rx size of 500 and I send more than that through the endpoint,
I guess they just vanish into the ether).</span></div>
<div class=""><br class="">
<blockquote type="cite" class="">int e = fi_writemsg(ep, &msg, FI_REMOTE_CQ_DATA);<br class="">
<br class="">
if (likely(!e)) {<br class="">
return true;<br class="">
}<br class="">
<br class="">
if (likely(e == -FI_EAGAIN)) {<br class="">
return false;<br class="">
}<br class="">
<br class="">
fmt::print(stderr, "[{}] has unhandled tx error {}: {}\n", mpi::rank(), -e, fi_strerror(-e));<br class="">
</blockquote>
</div>
<div class=""><br class="">
</div>
<div class="">Unfortunately I never see any completions on the target rank (unlike <span style="caret-color: rgb(0, 0, 0);" class="">UDP;ofi_rxd where things are fine).</span></div>
<div class=""><span style="caret-color: rgb(0, 0, 0);" class=""><br class="">
</span></div>
<div class=""><font class="">Is there some magic that I need with gni to make</font> <span style="caret-color: rgb(0, 0, 0);" class="">FI_REMOTE_CQ_DATA</span><span style="" class=""> work?</span></div>
<div class=""><font class=""><br class="">
</font></div>
<div class=""><font class=""><span style="caret-color: rgb(0, 0, 0);" class="">Thanks,</span></font></div>
<div class=""><font class=""><span style="caret-color: rgb(0, 0, 0);" class="">Luke</span></font></div>
</div>
</div>
_______________________________________________<br class="">
Libfabric-users mailing list<br class="">
<a href="mailto:Libfabric-users@lists.openfabrics.org" class="">Libfabric-users@lists.openfabrics.org</a><br class="">
https://lists.openfabrics.org/mailman/listinfo/libfabric-users<br class="">
</div>
</blockquote>
</div>
<br class="">
</div>
</body>
</html>