[ofiwg] getting hardware details from libfabric
Brice.Goglin at inria.fr
Mon May 7 13:47:58 PDT 2018
For hwloc, we indeed need something to help applications bind tasks
and/or memory buffer near the network device. On Linux, the "name" of
the kernel object ("hfi1_0", "usnic2", etc) is often enough because we
can walk from it to PCI objects under sysfs, and then get locality
information from there. But it's not that easy:
* Some proprietary drivers don't expose anything in sysfs, hence there's
no such name. I assume you'll have at least some sort of "device index".
* It's not clear we'll get such a name on non-Linux systems
Another solution is to give us the PCI bus ID, which should work all
physical hardware, except bonding etc.
Having both the Linux kernel name and the PCI bus ID would be nice.
Le 04/05/2018 à 14:28, Jeff Squyres (jsquyres) a écrit :
> 2. Another topic that comes up not infrequently is the ability to correlate a fabric/domain/endpoint to some other corresponding Linux entity, such as an IP interface and/or PCI device (if relevant). This obviously doesn't work for fabrics/domains/endpoints that represent emulation devices, may be tricky for bonded devices, ...etc. But there are many providers that create fabrics/domains/endpoints that directly correlate with a specific Linux device. Tools like hwloc (and therefore Open MPI) could definitely use this information for determining locality, especially where short message latency matters.
> Some sort of optional of fabric/domain/endpoint correlation to a Linux device would be genuinely useful.
> I honestly haven't given a ton of thought to either of these other than "that would be useful"; apologies if this is somewhat half-baked.
>> On May 3, 2018, at 4:45 PM, Hefty, Sean <sean.hefty at intel.com> wrote:
>> There has been a long outstanding set of requests to obtain HW specific data from libfabric. A side discussion brought this topic up again, so I'd like to at least put it on the agenda as a possible feature for 1.7. As a point of reference, Cisco has implemented a set of provider specific ops to retrieve device specific data. It's fairly simple, and details are here:
>> This feature would obviously only apply to providers that are directly associated with some sort of HW device.
>> What I would like to start to collect is a list of what sort of attributes would be desirable to report, or what applications or users could make use of.
>> - Sean
>> ofiwg mailing list
>> ofiwg at lists.openfabrics.org
More information about the ofiwg