[libfabric-users] Connectionless RDM question

Biddiscombe, John A. biddisco at cscs.ch
Mon Mar 6 11:50:48 PST 2017


I’ve converted my code to use RDM endpoints instead of MSG ones and have a conceptual problem.

Node A knows the address of Node B at startup due to some out of band exchange.
Node A adds the address of B to an address vector and gets an fi_addr_t which it then uses to send a message to B.
The message contains some rdma info to tell B “grab this chunk of data from me at this address”

With a connection oriented mode, Node B has a connection to A and uses the ‘A endpoint’ it created after FI_CONNECT to do an fi_read from A.

With connectionless, Node B has no real idea node A exists and receives the message saying “grab this data” (from where?).

What is the recommended way for B to get the data from A. It has not added the address of A to its address vector and has no fi_addr_t for A and when it receives a message from A – so how does it know who A is?

Does A have to send (on first connection/message) a warning message to B so that B can add A it to its AV first? Is this how it is normally done? (this would mimic the connect request/connected model used for MSG endpoint).
Or … should every node add every other node to its AV at startup so that when B gets a pull request from A it at least has an fi_addr_t of A and only needs to know which node to pull from. (If I have thousands of nodes communicating, but only using a nearest neighbour pattern, it seems a waste to add N^2 addresses to AV at startup)

When I receive a message on an RDM endpoint, the same endpoint handles all peers, so now I don’t know how to get the address of the message origin again – do I have to send the address with every rdma request so that the receiver know where to read from?

PS. Extra question. If Node A registers node C in its AV and node B registers node C in its AV, will A dna B have the same fi_addr_t for C? (Can they be shared? – imlying that I could send the fi_addr_t as a handle with my rdma read requests so that B would know what address to use when getting data)

Sorry for so many questions, and thanks in advance for any advice

JB

--
John Biddiscombe,                        email:biddisco @.at.@ cscs.ch
http://www.cscs.ch/
CSCS, Swiss National Supercomputing Centre  | Tel:  +41 (91) 610.82.07
Via Trevano 131, 6900 Lugano, Switzerland   | Fax:  +41 (91) 610.82.82

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20170306/f03c5bdc/attachment.html>


More information about the Libfabric-users mailing list