[ofiwg] getting hardware details from libfabric

Jeff Squyres (jsquyres) jsquyres at cisco.com
Fri May 4 05:28:04 PDT 2018


Thanks for bringing this up, Sean.  A little more rationale:

1. We basically modeled usnic_devinfo(1) off ibv_devinfo(1); it's primarily aimed at sysadmins/IT as part of verification that our usNIC stack is functioning properly.  It's currently using usnic provider extensions to get this information (look at the "Fabric extensions: netinfo" section in https://ofiwg.github.io/libfabric/v1.6.0/man/fi_usnic.7.html), but it would be nice if this kind of stuff was standardized somehow.  Here's output from usnic_devinfo:

$ /opt/cisco/usnic/bin/usnic_devinfo -d usnic_0
usnic_0:
        Interface:               vic20
        MAC Address:             24:57:20:06:20:00
        IP Address:              10.10.0.6
        Netmask:                 255.255.0.0
        Prefix len:              16
        MTU:                     9000
        Link State:              UP
        Bandwidth:               10 Gb/s
        Device ID:               UCSC-PCIE-CSC-02 [VIC 1225] [0x0085]
        Vendor ID:               4407
        Vendor Part ID:          207
        Firmware:                4.1(1d)
        VFs:                     64
        CQ per VF:               4
        QP per VF:               6
        Interrupts per VF:       4
        Max CQ:                  256
        Max CQ Entries:          65535
        Max QP:                  384
        Max Send Credits:        4095
        Max Recv Credits:        4095
        Capabilities:
          Map per res:           yes
          PIO sends:             no
          CQ interrupts:         no

I'm quite sure a lot of this specific information is unique to our device; I don't think that these exact fields are worth standardizing, of course.

2. Another topic that comes up not infrequently is the ability to correlate a fabric/domain/endpoint to some other corresponding Linux entity, such as an IP interface and/or PCI device (if relevant).  This obviously doesn't work for fabrics/domains/endpoints that represent emulation devices, may be tricky for bonded devices, ...etc.  But there are many providers that create fabrics/domains/endpoints that directly correlate with a specific Linux device.  Tools like hwloc (and therefore Open MPI) could definitely use this information for determining locality, especially where short message latency matters.

Some sort of optional of fabric/domain/endpoint correlation to a Linux device would be genuinely useful.

I honestly haven't given a ton of thought to either of these other than "that would be useful"; apologies if this is somewhat half-baked.


> On May 3, 2018, at 4:45 PM, Hefty, Sean <sean.hefty at intel.com> wrote:
> 
> There has been a long outstanding set of requests to obtain HW specific data from libfabric.  A side discussion brought this topic up again, so I'd like to at least put it on the agenda as a possible feature for 1.7.  As a point of reference, Cisco has implemented a set of provider specific ops to retrieve device specific data.  It's fairly simple, and details are here: 
> 
> https://github.com/cisco/usnic_tools/blob/master/usnic_devinfo.c
> 
> This feature would obviously only apply to providers that are directly associated with some sort of HW device.
> 
> What I would like to start to collect is a list of what sort of attributes would be desirable to report, or what applications or users could make use of.
> 
> - Sean
> _______________________________________________
> ofiwg mailing list
> ofiwg at lists.openfabrics.org
> http://lists.openfabrics.org/mailman/listinfo/ofiwg


-- 
Jeff Squyres
jsquyres at cisco.com




More information about the ofiwg mailing list