[libfabric-users] FI_HMEM support for NVIDIA and Intel GPUs
sean.hefty at intel.com
Fri Apr 30 17:15:38 PDT 2021
> We have tested the MPICH using OFI on NVIDIA DGX100 nodes as well as on latest Intel
> GPU testbeds. Both nodes use Mellanox networks, and we were able to have the MPICH run
> over `verbs;ofi_rxm`. However, we could not see the `FI_HMEM` capability with the
> `verbs` provider, although we did see `FI_HMEM` with the `shm` provider. We do see code
> in OFI for `FI_HMEM` support at least with ZE (Intel GPU) but could not see that
> capability listed with our libfabric build. Is this what we are expected to see as
FI_HMEM in the verbs provider needs very recent versions of libibverbs and the linux kernel. It relies on a feature referred to as dmabuf, which is only a couple months old.
I also doubt that Nvidia GPUs support dmabuf yet.
The shm provider does not rely on dmabuf, because there's no NIC involved.
More information about the Libfabric-users