[ofiwg] 10/29/2024 OFIWG meeting notes

Tue Oct 29 13:02:42 PDT 2024

10/29/2024
Participants:
Alexia Ingerson (Intel)
Jianxin Xiong (Intel)
Alex McKinley (Intel)
Ben Lynam (Cornelis)
Charles Shereda (Cornelis)
Howard Pritchard (LANL)
Ian Ziemba (HPE)
Juee Desai (Intel)
Nikhil Nanal (Intel)
Peinan Zhang (Intel)
Shi Jin (AWS)
Stephen Oost (Intel)
Steve Welch (HPE)
Zach Dworkin (Intel)

Summary:
Libfabric 2.0 beta was released on 10/25, RC1 for 2.0 GA planned from 11/22/2024. Please ifx coverity and double check provider man pages to make sure everything is up to date.
The two implementations of GPU RDMA with CUDA and verbs were discussed (peer memory available only in MOFED which has an extra API call, and using dmabuf where the application needs to register memory with the address as well as the file descriptor in order to register with the RMDA driver).

Notes:
2.0 status and plan:

  *   2.0 beta went out Friday (10/25)
  *   Planned new features all in
  *   RC 1 for 2.0 GA is planned Fri. November 22
     *   Focus on testing and fixing bugs
     *   Coverity, lots in opx, lpp, lnx, psm3
     *   Double check man pages, especially provider man pages, make sure everything is up to date
GPU RDMA with CUDA and verbs:

  *   2 different mechanisms: use dmabuf, use peer memory (old, part of MOFED, has extra API called peer mem)
     *   Old way: peer memory
        *   Driver will query all clients to see who owns the address, driver will do address translation for NIC - only available with MOFED, not available with distro kernel driver
     *   New way: use dmabuf GPU driver and RDMA driver
        *   Application needs to get address as well as file descriptor. Client uses different API to register memory which will interact with dma driver
Q: When app only passes address, not fd?
A: Currently not available but something we can add
Efa tried to do it but had to revert. Can only use dmabuf when using the NCCL dmabuf API
As long as app registers address with correct iface, libfabric can query this

In order to enable dmabuf for CUDA:

  *   Requires linux kernel 5.12 and later
  *   Requires CUDA 11.7 and later
  *   Installed with "-m=kernel_open" flag
  *   Check CUDA dmabuf support: cuDeviceGetAttribute()
  *   Get dmabuf fd for CUDA allocated buffer: cuMemGetHandleForAddressRange()
Verbs provider will look at flags, kernel support for dmabuf, iface, etc, and will either go through ibv_reg_mr or ibv_reg_dmabuf_mr
Q: How do you detect kernel support for peer memory
A: Also check the kernel symbol: ib_register_peer_memory_client
Q: NIC and peer memory registration can fail, so doing kernel check is not sufficient. Trial and error
A: verbs takes a simple way and checks for call and can fall back if it fails later
Q: efa tries registering on every iface to see if available
Currently if you want to usee dma support for CUDA/verbs you have to use peer memory approach. Dmabuf is supported for CUDA but ofi doesn't check CUDA, propose adding CUDA to dmabuf ifaces
Fabtests has option to allocate device memory ("-D cuda") and option to register with FI_MR_DMABUF flag ("-R"), see common/shared.c
Q: no way for application to detect whether peer memory or dmabuf is available. Would it be good to have an API to report that?
A: can include it in logging.
Q: we already do, tried to add global api (in fi_info) but was rejected

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofiwg/attachments/20241029/31b4aa03/attachment.htm>