[libfabric-users] Libfabric questions

Niyaz Murshed Niyaz.Murshed at arm.com
Wed May 8 05:59:27 PDT 2024


Thank Jianxin for the reply.

I can see that psm3 is not being enabled on arm platform. I will check on that.
/* psm3 provider is built */
#define HAVE_PSM3 0

/* psm3 provider is built as DSO */
#define HAVE_PSM3_DL 0

/* PSM3 source is built-in */
#define HAVE_PSM3_SRC 1


On x86 machine :
/* psm3 provider is built */
#define HAVE_PSM3 1

/* psm3 provider is built as DSO */
#define HAVE_PSM3_DL 0

/* PSM3 source is built-in */
#define HAVE_PSM3_SRC 1

However, one question I have is for x86. The machine has both Intel and Mellanox NICs.

Root#/libfabric# lshw -c net -businfo -numeric
Bus info          Device          Class          Description
============================================================
pci at 0000:01:00.0<mailto:pci at 0000:01:00.0>  eno1            network        I210 Gigabit Network Connection [8086:1533]

pci at 0000:40:00.0<mailto:pci at 0000:40:00.0>  ens2f0np0       network        MT2910 Family [ConnectX-7] [15B3:1021] (Mellanox Technologies MT2910 Family [ConnectX-7])



But when I run fi_info, psm3 is only enabled on for the intel nic

root#fi_info -p psm3
provider: psm3
    fabric: TCP-10.118.91.0/24
    domain: eno1
    version: 306.10
    type: FI_EP_RDM
    protocol: FI_PROTO_PSMX3
provider: psm3
    fabric: TCP-10.118.91.0/24
    domain: eno1
    version: 306.10
    type: FI_EP_RDM
    protocol: FI_PROTO_PSMX3
provider: psm3
    fabric: TCP-10.118.91.0/24
    domain: eno1
    version: 306.10
    type: FI_EP_RDM
    protocol: FI_PROTO_PSMX3

root # lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Xeon(R) Platinum 8480+

Is this expected ? Shouldn’t it show up for the Mellanox devices too?

root # ibv_devinfo
hca_id:     mlx5_0
      transport:              InfiniBand (0)
      fw_ver:                       28.39.3004
      node_guid:              946d:ae03:007d:a230
      sys_image_guid:              946d:ae03:007d:a230
      vendor_id:              0x02c9
      vendor_part_id:              4129
      hw_ver:                       0x0
      board_id:               MT_0000000834
      phys_port_cnt:               1
            port: 1
                  state:                  PORT_ACTIVE (4)
                  max_mtu:          4096 (5)
                  active_mtu:       1024 (3)
                  sm_lid:                 0
                  port_lid:         0
                  port_lmc:         0x00
                  link_layer:       Ethernet



Regards,
Niyaz


From: Xiong, Jianxin <jianxin.xiong at intel.com>
Date: Wednesday, May 8, 2024 at 12:33 AM
To: Niyaz Murshed <Niyaz.Murshed at arm.com>, libfabric-users at lists.openfabrics.org <libfabric-users at lists.openfabrics.org>
Cc: nd <nd at arm.com>
Subject: RE: Libfabric questions
Hi Niyaz,

The mlx provider is only available as part of Intel MPI package, which is only available for x86.

If you build libfabric from source, you can use the ucx provider instead. Similar to the mlx provider, the ucx provider runs on top of UCX.

The verbs provider runs on top of IB Verbs. The domain name “mlx5_0” is the device name of Mellanox NIC. It has nothing to do with the mlx provider, nor UCX.

The psm3 provider supports Intel NICs as well as Ethernet/IB/RoCE NICs from other vendors. Whether it is present on a machine depends on which package is installed. For example, if it is installed as part of the Intel Ethernet Fabric Suite package, then it is only available for x86_64. You may try to build it from libfabric source on other platforms, by adding “—enable-psm3” to the libfabric “configure” command line. If configure fails, check the content of “config.log” to see what is missing. This is an untested territory so I can’t guarantee that it will work.

The third question has been answered off-line, so I won’t repeat here.

-Jianxin

From: Libfabric-users <libfabric-users-bounces at lists.openfabrics.org> On Behalf Of Niyaz Murshed
Sent: Monday, May 6, 2024 6:16 PM
To: libfabric-users at lists.openfabrics.org
Cc: nd <nd at arm.com>
Subject: Re: [libfabric-users] Libfabric questions

Hello,

Can some please share some knowledge on this 😊

Regards,
Niyaz

From: Libfabric-users <libfabric-users-bounces at lists.openfabrics.org<mailto:libfabric-users-bounces at lists.openfabrics.org>> on behalf of Niyaz Murshed <Niyaz.Murshed at arm.com<mailto:Niyaz.Murshed at arm.com>>
Date: Tuesday, April 30, 2024 at 10:16 AM
To: libfabric-users at lists.openfabrics.org<mailto:libfabric-users at lists.openfabrics.org> <libfabric-users at lists.openfabrics.org<mailto:libfabric-users at lists.openfabrics.org>>
Subject: [libfabric-users] Libfabric questions
Hello all ,

I am trying to learn libfabric and have some basic questions. Please point me in the right direction.

I am running on ubuntu 22.04 with Mellanox ConnectX 7 NIC cards on an Arm platform (N1).

Question 1:
Is fi_mlx provider only available on x86?

Steps :
Installed UCX (as required in fi_mlx. https://ofiwg.github.io/libfabric/v1.8.0/man/fi_mlx.7.html)

# Library version: 1.18.0
Install libfabric

As per https://www.intel.com/content/www/us/en/developer/articles/technical/mpi-library-2019-over-libfabric.html  this is only available for infiniband devices.
However, in one of the presentations, it shows mlx also supports RoCE (https://ibb.co/bPhm0dp)

Could you please confirm if MLX provider will work on RoCE ? Other than UCX, do we need to install anything else to enable MLX provider?

In my current setup, I only see Verbs provider with doman : mlx
provider: verbs
    fabric: IB-0xfe80000000000000
    domain: mlx5_0
    version: 121.0
    type: FI_EP_MSG
    protocol: FI_PROTO_RDMA_CM_IB_RC

Is Verbs->mlx equivalent to verbs api wrapper over mlx provider?

Question 2:
Is PSM3 only available for intel NIC cards and only on x86 platform?
I have the same installation on x86 and Arm , however,  I see on x86 platform, the intel NICs has the following enabled

provider: psm3

    fabric: TCP-10.118.91.0/24

    domain: eno1

    version: 306.10

    type: FI_EP_RDM

    protocol: FI_PROTO_PSMX3

The same is not present on Arm platform.

Question 3:

My goal is to run rdma test between 2 application (RoCE).
I have 2 nics  on the server, one on numa0 and another on numa1. I have a loop cable from nic1 to nic2.
How can I make sure that the data is transferred  via the loop cable? Can this be done with fi_rma_bw app?



Regards,
Niyaz

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20240508/65f1848f/attachment-0001.htm>


More information about the Libfabric-users mailing list