[libfabric-users] PD/MR sharing by multiple NIC

Xiong, Jianxin jianxin.xiong at intel.com
Wed Aug 21 10:21:42 PDT 2024


That’s right in general, especially w.r.t. verbs.

However, in a more general case that may not fit your specific scenario, if the provider allows user supplied memory keys, it’s possible to have to two registrations using the same key. Keys only need to be unique within the same domain, so they can be reused in different domains.

From: Niyaz Murshed <Niyaz.Murshed at arm.com>
Sent: Wednesday, August 21, 2024 10:07 AM
To: Xiong, Jianxin <jianxin.xiong at intel.com>; libfabric-users at lists.openfabrics.org
Subject: Re: PD/MR sharing by multiple NIC

Thanks Jianxin. That means the same memory can be shared by multiple domain.

Let’s say we We have 1 large memory region and 2 NICs used by the process on both server and client.
This would mean the remote node should have the knowledge of which NIC it is sending the write out from and should use the appropriate Rkey to use for that Write message.

We cannot have a common Rkey per rank(process) for this region so that remote node doesn’t have to worry which NIC is receiving it.




From: Xiong, Jianxin <jianxin.xiong at intel.com<mailto:jianxin.xiong at intel.com>>
Date: Wednesday, August 21, 2024 at 11:51 AM
To: Niyaz Murshed <Niyaz.Murshed at arm.com<mailto:Niyaz.Murshed at arm.com>>, libfabric-users at lists.openfabrics.org<mailto:libfabric-users at lists.openfabrics.org> <libfabric-users at lists.openfabrics.org<mailto:libfabric-users at lists.openfabrics.org>>
Subject: RE: PD/MR sharing by multiple NIC
There is a caching mechanism for the registration.

In general case (not psm3), you could pre-register the same buffer for each domain, and choose the right mr to use depending on which domain the endpoint being worked belongs to.

From: Niyaz Murshed <Niyaz.Murshed at arm.com<mailto:Niyaz.Murshed at arm.com>>
Sent: Wednesday, August 21, 2024 9:42 AM
To: Xiong, Jianxin <jianxin.xiong at intel.com<mailto:jianxin.xiong at intel.com>>; libfabric-users at lists.openfabrics.org<mailto:libfabric-users at lists.openfabrics.org>
Subject: Re: PD/MR sharing by multiple NIC

Thank you Jianxin. Really appreciate the explanation.

“An MR only needs to register with the NIC when needed.”—
I was wondering about the cost of registration on every read/write.

I will check detail of “rv” module to see what is happening.
I was thinking to avoid the cost of registration on read/write by pre-registering a large memory during init. However, that doesn’t work if multiple NICs cannot share the same pre-registered memory.



From: Xiong, Jianxin <jianxin.xiong at intel.com<mailto:jianxin.xiong at intel.com>>
Date: Wednesday, August 21, 2024 at 11:34 AM
To: Niyaz Murshed <Niyaz.Murshed at arm.com<mailto:Niyaz.Murshed at arm.com>>, libfabric-users at lists.openfabrics.org<mailto:libfabric-users at lists.openfabrics.org> <libfabric-users at lists.openfabrics.org<mailto:libfabric-users at lists.openfabrics.org>>
Subject: RE: PD/MR sharing by multiple NIC
For psm3, domain is purely a software object. It’s the endpoint that maps to a specific nic (and ibv pd). An MR only needs to register with the NIC when needed.

With RoCE, PSM3 can work over UD or RC. When using UD, user data are always copied over pre-registered internal buffers that is allocated per endpoint. User MR doesn’t need to be registered with the NIC.

When using RC,  registration with the NIC is only needed for RDMA read/write operations. This is done via the optional “rv” kernel module.


From: Niyaz Murshed <Niyaz.Murshed at arm.com<mailto:Niyaz.Murshed at arm.com>>
Sent: Wednesday, August 21, 2024 8:54 AM
To: Xiong, Jianxin <jianxin.xiong at intel.com<mailto:jianxin.xiong at intel.com>>; libfabric-users at lists.openfabrics.org<mailto:libfabric-users at lists.openfabrics.org>
Subject: Re: PD/MR sharing by multiple NIC

I assumed PSM3 underneath uses rdma-core  for RoCE protocol.
Slide 8 from https://www.openfabrics.org/wp-content/uploads/2021-workshop-presentations/405_Rimmer_PSM3-Architecture.pdf

So,  PSM3 has its own implementation of RoCE protocol?

From: Xiong, Jianxin <jianxin.xiong at intel.com<mailto:jianxin.xiong at intel.com>>
Date: Wednesday, August 21, 2024 at 10:34 AM
To: Niyaz Murshed <Niyaz.Murshed at arm.com<mailto:Niyaz.Murshed at arm.com>>, libfabric-users at lists.openfabrics.org<mailto:libfabric-users at lists.openfabrics.org> <libfabric-users at lists.openfabrics.org<mailto:libfabric-users at lists.openfabrics.org>>
Subject: RE: PD/MR sharing by multiple NIC
As I said, you can’t do that with the verbs provider.

From: Niyaz Murshed <Niyaz.Murshed at arm.com<mailto:Niyaz.Murshed at arm.com>>
Sent: Wednesday, August 21, 2024 8:28 AM
To: Xiong, Jianxin <jianxin.xiong at intel.com<mailto:jianxin.xiong at intel.com>>; libfabric-users at lists.openfabrics.org<mailto:libfabric-users at lists.openfabrics.org>
Subject: Re: PD/MR sharing by multiple NIC

Thank you Jianxin.

ibv_alloc_pd takes in ibv_context which comes opening a particular device.
How can we have same PD linked to multiple device ?

Do we have different PDs but same MR?

From: Xiong, Jianxin <jianxin.xiong at intel.com<mailto:jianxin.xiong at intel.com>>
Date: Wednesday, August 21, 2024 at 10:16 AM
To: Niyaz Murshed <Niyaz.Murshed at arm.com<mailto:Niyaz.Murshed at arm.com>>, libfabric-users at lists.openfabrics.org<mailto:libfabric-users at lists.openfabrics.org> <libfabric-users at lists.openfabrics.org<mailto:libfabric-users at lists.openfabrics.org>>
Subject: RE: PD/MR sharing by multiple NIC
It depends on the provider. Since MR is domain level object, in order to share the MR, the endpoints must share the same domain. Some providers (e.g. verbs) map domain to a specific nic and thus can’t share MR among nics. Other providers (e.g. psm3) have domain as pure software entity and can have multiple NICs under the same domain.

Jianxin

From: Libfabric-users <libfabric-users-bounces at lists.openfabrics.org<mailto:libfabric-users-bounces at lists.openfabrics.org>> On Behalf Of Niyaz Murshed
Sent: Wednesday, August 21, 2024 8:00 AM
To: libfabric-users at lists.openfabrics.org<mailto:libfabric-users at lists.openfabrics.org>
Subject: [libfabric-users] PD/MR sharing by multiple NIC

Hello,

Is it possible to multiple NICs share the same PD/MR?
i.e., can I register a memory region that can be used by multiple NICs?

Regards,
Niyaz
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20240821/ebceee30/attachment-0001.htm>


More information about the Libfabric-users mailing list