[ofiwg] Clarification of definition of FI_THREAD_DOMAIN

Byrne, John (Labs) john.l.byrne at hpe.com
Wed Jun 5 16:28:38 PDT 2024


The sentence beginning "Since uncontended locks FI_THREAD_SAFE..." should read "Since uncontended locks aren't very expensive, FI_THREAD_SAFE..."

From: ofiwg <ofiwg-bounces at lists.openfabrics.org> On Behalf Of Byrne, John (Labs)
Sent: Wednesday, June 5, 2024 4:24 PM
To: Xiong, Jianxin <jianxin.xiong at intel.com>; ofiwg at lists.openfabrics.org
Subject: Re: [ofiwg] Clarification of definition of FI_THREAD_DOMAIN

Thanks. I couldn't find a statement to that effect and every time I read about FI_THREAD_DOMAIN, I just kept assuming a single domain with multiple endpoints and thinking what a horrible idea that would be. Since uncontended locks FI_THREAD_SAFE is certainly the simplest way to go assuming the only frequently used locks are on the endpoint and completion paths. If you have a MR Cache and are actively using it, then its locking gets annoying as things scale up, though.

John

From: Xiong, Jianxin <jianxin.xiong at intel.com<mailto:jianxin.xiong at intel.com>>
Sent: Wednesday, June 5, 2024 3:35 PM
To: Byrne, John (Labs) <john.l.byrne at hpe.com<mailto:john.l.byrne at hpe.com>>; ofiwg at lists.openfabrics.org<mailto:ofiwg at lists.openfabrics.org>
Subject: RE: Clarification of definition of FI_THREAD_DOMAIN

Your understanding is correct.  The recommendation is based on how feasible to have a lockless implementation in the providers. That also matches with how middleware like MPI is doing today.

Does FI_THREAD_COMPLETION fits better with the multi-threaded RMA use case? It is recommended for scalable endpoints because that's when this threading model is more likely supported by the provider.  But using that with regular endpoint is totally fine if available.

For simplicity at the user end, maybe just go with FI_THREAD_SAFE.

-Jianxin

From: ofiwg <ofiwg-bounces at lists.openfabrics.org<mailto:ofiwg-bounces at lists.openfabrics.org>> On Behalf Of Byrne, John (Labs)
Sent: Wednesday, June 5, 2024 10:10 AM
To: ofiwg at lists.openfabrics.org<mailto:ofiwg at lists.openfabrics.org>
Subject: [ofiwg] Clarification of definition of FI_THREAD_DOMAIN

FI_THREAD_DOMAIN
A domain serialization model requires applications to serialize access to all objects belonging to a domain.

My immediate take on this definition is that if I am multi-threading I have to have a single lock that I use to access any object belonging to a fi_domain instance; which seems like a terrible idea for multi-threading. However, in Jianxin's 2.0 API update at the workshop https://www.openfabrics.org/wp-content/uploads/2024-workshop/2024-workshop-presentations/session-1.pdf<https://www.openfabrics.org/wp-content/uploads/2024-workshop/2024-workshop-presentations/session-1.pdf>, it says: "Recommend FI_THREAD_DOMAIN for multi-thread app with regular endpoint."  If my interpretation of the meaning of FI_THREAD_DOMAIN is correct, then the only way this makes sense to me is for the expectation to be that a unique fi_domain instance and endpoint be created for each thread. Is this correct or is there something I'm misunderstanding? If it is correct, then there are some painful implications for multi-threading RMA.

Thanks,

John Byrne


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofiwg/attachments/20240605/d8421f0e/attachment.htm>


More information about the ofiwg mailing list