[libfabric-users] [External] Detecting errors/flow control with FI_SELECTIVE_COMPLETION
D'Alessandro, Luke K
ldalessa at iu.edu
Mon Aug 24 17:28:59 PDT 2020
On Aug 24, 2020, at 2:02 PM, D'Alessandro, Luke K <ldalessa at iu.edu<mailto:ldalessa at iu.edu>> wrote:
On Aug 24, 2020, at 1:45 PM, Hefty, Sean <sean.hefty at intel.com<mailto:sean.hefty at intel.com>> wrote:
This message was sent from a non-IU address. Please exercise caution when clicking links or opening attachments from external sources.
-------
I’m trying to use libfabric to get some baseline performance numbers for some research
that we’re doing.
The functionality that I need is simply to transfer (address, value) pairs via remote
completion in an all-to-all setting.
I have sequential ranks, and have coopted RDM/sockets/fi_inject_writedata with 0-length
messages to do this, and it implements the required semantics correctly.
The problem that I have is that I can’t figure out how to implement flow control in
this setting. Obviously I should have resource issues in both the local and remote
endpoints, and I’m able to slow down and/or buffer on the TX side if need be.
If you're using sock_stream underneath, then the tcp layer is already handling flow control between peers. The buffering will end up being done in the local kernel.
I am using a TX:FI_SELECTIVE_COMPLETION cq, an RX cq, and a TX counter. I definitely
see <warn> messages if I spam tx or don’t complete rx fast enough, but I can’t seem to
detect any errors/failures on the TX side that would let me slow down (neither the cq
nor the counter ever reports an error).
Slowdowns on the Tx side should result in the transmit operation returning -FI_EAGAIN. I need to check the code to be sure, but I believe sockets will dynamically grow the CQ if needed.
Ohhh, so the fact that I’m not ever seeing any -FI_EAGAIN from fi_inject_writedata (or the fi_writemsg equivalent) might just be an artifact of the sockets provider that I’m testing with. If I move over to a different provider I might be able to see these occur? I’ll give that a shot.
The “UDP;ofi_rxd” provider supports the behavior that I need and responds the way that I expected.
Thanks,
Luke
libfabric:290798:sockets:cq:_sock_cq_write():181<warn> Not enough space in CQ
rank 0 posting 4279 to 0
rank [0] fi error: Resource temporarily unavailable
libfabric:290798:sockets:ep_data:sock_rx_new_buffered_entry():109<warn> Exceeded buffered recv limit
[portland:290798] *** Process received signal ***
libfabric:290798:sockets:ep_data:sock_pe_new_tx_entry():2251<warn> Invalid operation type
libfabric:290798:sockets:ep_data:sock_pe_progress_tx_ctx():2546<warn> failed to progress TX ctx
[portland:290798] Signal: Aborted (6)
[portland:290798] Signal code: (-6)
libfabric:290798:sockets:ep_data:sock_pe_progress_thread():2650<warn> failed to progress TX
Any ideas on what I can do here to discover these <warn>s eagerly at the user level?
I can easily set up some hard limits on the number of outstanding TX operations
(computed via the TX counter), but I don’t know where to find out what the right number
for that would be. Also, given the all-to-all nature of the communication I can
dedicate point-to-point remote RX accounting if that’s something I need to do manually.
The Tx/Rx sizes are set through the fi_info attributes. You can specify/retrieve them through fi_getinfo().
Okay, I’ve been using them to initialize tx/rq cq sizes but they didn’t seem to directly correspond to any changes in behavior that I could observe from the sockets provider.
The providers typically buffer unexpected messages (or at least the message headers) at the receiver, while there is sufficient memory. Flow control across multiple peers is difficult to achieve without introducing the possibility of application deadlock.
- Sean
_______________________________________________
Libfabric-users mailing list
Libfabric-users at lists.openfabrics.org<mailto:Libfabric-users at lists.openfabrics.org>
https://lists.openfabrics.org/mailman/listinfo/libfabric-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20200825/604fce33/attachment-0001.htm>
More information about the Libfabric-users
mailing list