[ofiwg] MPI_Barrier hang with sockets provider
sean.hefty at intel.com
Fri Mar 22 09:43:18 PDT 2019
> I was under the impression that the sockets provider implemented most or all
> of libfabric’s feature set and therefore is a good reference and starting
> point. Maybe that was true in the past, but these days not so much?
> What would you recommend as an example implementation for someone who wants to
> understand how to develop a provider?
The sockets provider was the original attempt to implement the full API set, primarily for application development purposes. However, many middlewares decided to remove their own socket implementations and instead rely on libfabric to provide socket support. As a result, libfabric had to reset its support for running over standard NICs, and add performance and scalability as objectives. So, the tcp and udp providers were created to replace sockets. They are much simpler and make better use of shared code.
Anyway, those would be the best starting points for someone looking to develop a provider. Coincidentally, at the OFA workshop this week, we heard the exact same request. All of the documentation that exists is geared toward users of the API. There is nothing to help guide provider developers.
The OFIWG will see what can be done to address that. If there are specific suggestions along those lines, please submit them or bring them up for discussion at the next OFIWG call.
More information about the ofiwg