[libfabric-users] Three basic concept questions on libfabric
Shane Miller
Shane.Miller at ice.com
Mon May 27 14:55:51 PDT 2024
We're looking at alternative network I/O APIs including libfabric. The objective is to support multiple hardware vendors without sacrificing latency or throughput.
While working my way through OFI docs, I've come up with a couple of basic questions that don't quite fallout into my lap at least to my eye.
The first issue pertains to basic API roles and responsibilities with respect to application messages, or what OFI seems to call "message boundaries." For example, APIs like HOMA, eRPC --- granted experimental --- explicitly guarantee message delivery (at least or exactly once). DPDK is fundamentally packet oriented and, as a practical matter, makes no guarantees about application messages.
The OFI situation --- correct me if wrong --- seems to be,
* Regardless of the provider, full application messages will be sent and received exactly once provided the endpoint(s) is configured with one of FI_EP_DGRAM, FI_EP_MSG, FI_EP_RDM
* FI_EP_DGRAM has a caveat about maximum message size
* While not explicitly stated<https://ofiwg.github.io/libfabric/v1.21.0/man/fi_endpoint.3.html> the guarantee seems to be whole messages are sent and delivered, but there's no guarantee about message order
* Regardless of the provider, no guarantees are made if the endpoint is configured with FI_EP_DGRAM. To make it reliable in this mode programmers must:
* Define a message reliability policy
* Use OFI send/recv calls to implement a protocol whereby senders and receivers exchange information until each side is convinced the policy is satisfied. This will likely leverage the usual suspects of ACKs, Grants, NACKs, CRC checks and so on
The second issue is multi-cast. Some of our messaging architecture leverages pub-sub through Infiniband multi-cast. Now, when I look at provider details, for example,
* Verbs<https://ofiwg.github.io/libfabric/v1.21.0/man/fi_verbs.7.html>
* UDP<https://ofiwg.github.io/libfabric/v1.21.0/man/fi_udp.7.html>
I had hoped to see a blurb mentioning whether or not mcast is supported method. Where can I find more details on this?
Finally, OFI's support for Direct<https://ofiwg.github.io/libfabric/v1.21.0/man/fi_direct.7.html> caught my eye. Here again provider pages are not tipping their hand as to whether there's a direct implementation. How can I pursue this detail?
Thank you,
Shane Miller
________________________________
This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of Intercontinental Exchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20240527/ab6730d6/attachment.htm>
More information about the Libfabric-users
mailing list