[ofiwg] getting hardware details from libfabric
Jeff Squyres (jsquyres)
jsquyres at cisco.com
Fri May 4 05:28:04 PDT 2018
Thanks for bringing this up, Sean. A little more rationale:
1. We basically modeled usnic_devinfo(1) off ibv_devinfo(1); it's primarily aimed at sysadmins/IT as part of verification that our usNIC stack is functioning properly. It's currently using usnic provider extensions to get this information (look at the "Fabric extensions: netinfo" section in https://ofiwg.github.io/libfabric/v1.6.0/man/fi_usnic.7.html), but it would be nice if this kind of stuff was standardized somehow. Here's output from usnic_devinfo:
$ /opt/cisco/usnic/bin/usnic_devinfo -d usnic_0
MAC Address: 24:57:20:06:20:00
IP Address: 10.10.0.6
Prefix len: 16
Link State: UP
Bandwidth: 10 Gb/s
Device ID: UCSC-PCIE-CSC-02 [VIC 1225] [0x0085]
Vendor ID: 4407
Vendor Part ID: 207
CQ per VF: 4
QP per VF: 6
Interrupts per VF: 4
Max CQ: 256
Max CQ Entries: 65535
Max QP: 384
Max Send Credits: 4095
Max Recv Credits: 4095
Map per res: yes
PIO sends: no
CQ interrupts: no
I'm quite sure a lot of this specific information is unique to our device; I don't think that these exact fields are worth standardizing, of course.
2. Another topic that comes up not infrequently is the ability to correlate a fabric/domain/endpoint to some other corresponding Linux entity, such as an IP interface and/or PCI device (if relevant). This obviously doesn't work for fabrics/domains/endpoints that represent emulation devices, may be tricky for bonded devices, ...etc. But there are many providers that create fabrics/domains/endpoints that directly correlate with a specific Linux device. Tools like hwloc (and therefore Open MPI) could definitely use this information for determining locality, especially where short message latency matters.
Some sort of optional of fabric/domain/endpoint correlation to a Linux device would be genuinely useful.
I honestly haven't given a ton of thought to either of these other than "that would be useful"; apologies if this is somewhat half-baked.
> On May 3, 2018, at 4:45 PM, Hefty, Sean <sean.hefty at intel.com> wrote:
> There has been a long outstanding set of requests to obtain HW specific data from libfabric. A side discussion brought this topic up again, so I'd like to at least put it on the agenda as a possible feature for 1.7. As a point of reference, Cisco has implemented a set of provider specific ops to retrieve device specific data. It's fairly simple, and details are here:
> This feature would obviously only apply to providers that are directly associated with some sort of HW device.
> What I would like to start to collect is a list of what sort of attributes would be desirable to report, or what applications or users could make use of.
> - Sean
> ofiwg mailing list
> ofiwg at lists.openfabrics.org
jsquyres at cisco.com
More information about the ofiwg