[libfabric-users] Scalable endopint issue

Biddiscombe, John A. john.biddiscombe at cscs.ch
Thu Dec 9 23:22:28 PST 2021


Sean


Interesting thanks.

I tried an experiment. Create a scalable endpoint, but only one rx context. Use a single thread that uses this rx context. It works. Once I increase the number of threads past one, then I have the probllem that if (say) tagged recvs are posted such a that 0,4,8,12 are on ctx0, 1,5,9,13 are on ctx1, 2,6,10,14 are on ctx2 and so on, then I start missing receives. Clearly some receives are going onto contexts that are listening for something else. This would seem to make the shareable receive contexts basically useless for my use case.


The problem I had before was that I am creating N contexts based on the number of threads I plan on using, but only posting receives to a subset of them initially, the cqs of the unused ones were probably being filled before the ones I was using and hence I did not see the first receives ...


Question : What is the expected use case of shared receive contexts? If I can't use them one per thread (at least for tagged messages), then when would I use them. If I had only unexpected messages, then one per thread might work, but you'd need to be very carefully to ensure all threads were polling right up to the end, otherwise some rx's might end up on cq's that were idle ...


Thanks as usual for info. (It troubles me a bit that there's only one of you who actually knows all this stuff ...)


JB


________________________________
From: Libfabric-users <libfabric-users-bounces at lists.openfabrics.org> on behalf of Hefty, Sean <sean.hefty at intel.com>
Sent: 09 December 2021 22:37:06
To: Biddiscombe, John A.; libfabric-users at lists.openfabrics.org
Subject: Re: [libfabric-users] Scalable endopint issue

> When receiving, suppose I have two Rx contexts and I post a receive on both and a
> message arrives - if they are tagged messages, then presumably whichever Rx has the
> right tag will receive it - if untagged, is it just round robiin, or chance/race to see
> which cq is given the receive?

If the provider has not exposed some sort of steering flow mechanism (provider specific definition), then the Rx context that receives a message is non-deterministic from the viewpoint of the app.  You're at the whim of the provider implementation, and it could also depend on traffic from other peers targeting the same endpoint.  E.g. round robin at the receiver may not look like round robin at the sender.

This is true even for tagged messages.  The Rx contexts are independent, and if a tagged message is received on one, but the matching tag was posted to another, then the message will be treated as unexpected.

Named Rx contexts are defined for sender directed traffic.

- Sean
_______________________________________________
Libfabric-users mailing list
Libfabric-users at lists.openfabrics.org
https://lists.openfabrics.org/mailman/listinfo/libfabric-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20211210/9ae9f2ce/attachment-0001.htm>


More information about the Libfabric-users mailing list