[libfabric-users] feature requests

Jeff Hammond jeff.science at gmail.com
Fri Jun 2 19:31:14 PDT 2017

In my non-expert opinion, OFI is already providing the right abstraction
for multi-rail situations in the form of domains:

"Domains usually map to a specific local network interface adapter. A
domain may either refer to the entire NIC, a port on a multi-port NIC, or a
virtual device exposed by a NIC. From the viewpoint of the application, a
domain identifies a set of resources that may be used together." (

>From this, MPI libraries and the like would then need to support multiple


On Fri, Jun 2, 2017 at 12:21 PM, Hefty, Sean <sean.hefty at intel.com> wrote:
> Copying libfabric-users mailing list on this message.
> Daniel, would you be able to join an ofiwg call to discuss these in more
detail?  The calls are every other Tuesday from 9-10 PST, with the next
call on Tuesday the 6th.
> - Sean
> > We work with HPC systems that deploy same but multiple network
> > adapters (including Intel OmniPath and MLX infiniband adapters) on
> > compute nodes.
> >
> > Over time, we encountered two issues which we believe can be addressed
> > by OFI library.
> >
> > First, a number of MPI implementations assume homogenous SW/HW setup
> > on all compute nodes.  For example, assume nodes with 2 adapters and 2
> > separate networks. Some MPI implementations assume that network
> > adapter A resides on CPU socket 0 on all nodes and connect to network
> > 0; and network adapter B resides on CPU socket 1 and connect to
> > network 1.  Unfortunately that is not always the case.  There are
> > systems where some nodes use adapter A to connect to network 0 and
> > others use adapter B to connect to network 0.  Same for network 1,
> > where we have mixed (crossed) adapters connected to same network.  In
> > such cases, MPII and lower layers cannot establish peer to peer
> > connection.  The best way  to solve this is to use the network subnet
> > ID to establish connection between pairs.  When there are multiple
> > networks and subnetwork IDs, mpirun would specify a network ID
> > (Platform MPI does this) and then the software can figure out from the
> > subnet ID what adapter each node is using to connect to such network.
> > Instead of implementing this logic in each MPI, it would be great if
> > OFI implements this logic since it is a one stop shop over all network
> > devices and providers.
> >
> > Second, multirail support is a hit and miss across MPI
> > implementations.  Intel Omnipath PSM2 library actually did a great job
> > here by implementing multirail support at the PSM2 level. This means
> > all above layers like MPI would get this functionality for free.
> > Again, given that many MPI implementation can be built on top of OFI,
> > It would be also great if OFI has multirail support.
> >
> > Thank you
> > Daniel Faraj
> _______________________________________________
> Libfabric-users mailing list
> Libfabric-users at lists.openfabrics.org
> http://lists.openfabrics.org/mailman/listinfo/libfabric-users

Jeff Hammond
jeff.science at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20170602/85ea1f88/attachment.html>

More information about the Libfabric-users mailing list