[ofiwg] shared recvs

Reese Faucette (rfaucett) rfaucett at cisco.com
Tue Oct 28 14:55:14 PDT 2014


I had figured that each new "endpoint" configured via fi_accept() was backed by the same underlying transport address.  The only reason cep1 and cep2 exist at all (in this example) is so that the app can send directly to remote 1 and remote 2 without needing to do a sendto() - i.e. the remote destination is implicit in the endpoint.

Based on conversations with Sean, my understanding is that, fundamentally, an "endpoint" is simply a software construct on top of underlying hardware resources, and that rx/tx_context are more closely tied to the HW resources.  Thus, cep1 and cep2 would have the same underlying HW resource and transport address, but would be independent "connections" across that resource.

Was just asking in case someone else already had a clear understanding of how the various flavors of reliable messaging are intended to work and be accessed.

To me there are (at least?) 3 different flavors of reliable messaging semantics:

- Reliable Connection (e.g. Verbs RC or TCP sockets)  (FI_EP_MSG + ?)

*         Explicit listen/connect (server/client) connection

*         Posted receives are will only be filled with data from a particular remote source

- Reliable Shared Connection (e.g. Verbs XRC / SRQ ? (not sure which))  (FI_EP_MSG + ?)

*         Explicit listen/connect (server/client) connection

*         Posted receives may be filled with data from any connected remote source

(how does app determine source?)

- Reliable Datagram (FI_EP_RDM)

*         NO explicit connection

*         Posted receives may be filled with data from any remote source holding our address

Does this sound like the right set of modes?  Are there more?  (Scalable endpoints, a la RSS, would be orthogonal to this) intentionally did not specify the API, as in whether the code presents access to a connection as a unique "endpoint" or a combination of "endpoint" and "remote address" since that is really a presentation detail independent of the semantics.

Thoughts, comments?


From: Sur, Sayantan [mailto:sayantan.sur at intel.com]
Sent: Tuesday, October 28, 2014 1:12 PM
To: Reese Faucette (rfaucett); ofiwg at lists.openfabrics.org
Subject: RE: shared recvs

The manpage on fi_endpoint says that scalable endpoints have only one transport level address. It seems to me that in your example using passive endpoints, you would end up with two transport addresses. I'm not sure that the scalable endpoint is what you want here. Are you wanting the ability to use FI_RDM with passive endpoint?

I think in general, there might be an issue with sharing the same receive queue (i.e. FI_RDM) when using passive endpoints. It seems to me from the APIs that every time you accept a new "connection" you end up with a new endpoint - like in a connection oriented API. Since every endpoint has its own receive queue, there isn't a good way to share that receive queue.

I'm not sure what the solution is here, other than to wait for Sean :) It could be that the fi_cm APIs need a mechanism to "reuse" an open RDM type endpoint, vs creating a new one each time.

From: ofiwg-bounces at lists.openfabrics.org<mailto:ofiwg-bounces at lists.openfabrics.org> [mailto:ofiwg-bounces at lists.openfabrics.org] On Behalf Of Reese Faucette (rfaucett)
Sent: Tuesday, October 28, 2014 12:25 PM
To: ofiwg at lists.openfabrics.org<mailto:ofiwg at lists.openfabrics.org>
Subject: [ofiwg] shared recvs

If an app wishes to create a passive endpoint, accept connections on it, and then post receives that will receive data coming from any remote connection on that endpoint, how exactly is that accomplished?

My best guess is by using fi_rx_context, we can post a "shared" receive buffer via:

fi_pendpoint(&pep);
fi_bind(pep, cmeq);
fi_listen(pep);

fi_eq_sread(cmeq); // wait for CONNREQ 1
fi_endpoint(&cep1);  // remote-specific EP
fi_accept(cep1);
fi_eq_sread(cmeq); // wait for COMPLETE 1

fi_eq_sread(cmeq); // wait for CONNREQ 2
fi_endpoint(&cep2);  // remote-specific EP
fi_accept(cep2);
fi_eq_sread(cmeq); // wait for COMPLETE 2

fi_rx_context(cep1, 0, &rxep);   // get common EP for RX
fi_recv(rxep, buf, len);

Now, a send from the remote endpoint associated with either cep1 or cep2 will land in buf, yes ?

I'm sure there are cases where support for this mode of operation is desired vs. not, what are the endpoint flags that would control whether this approach will work or not?  I imagine that an endpoint that supports the above would NOT support posting a receive directly to cep1 or cep2, and that endpoint that expect to post received to cep1 only for remote1 and cep2 for remote 2 (e.g. traditional Verbs RC endpoint) would not support this shared mode of operation.

Assuming I am not too far off in the woods here, how is this shared/non-shared approach to receives communicated in the API?
Thanks,
-reese

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofiwg/attachments/20141028/6983da15/attachment.html>


More information about the ofiwg mailing list