[libfabric-users] trouble by FI_SOURCE use - revisited

Sat Apr 27 12:18:13 PDT 2019

Sean,

Yes, this was the sockets provider.   I have repeated the test using the gni provider, and it works as one would hope.

The client’s initial hello message to the server is FI_ADDR_NOTAVAIL as expected.  The server extracts the client’s raw address from the message and adds it to its AV table.   The next message to arrive from the client is also FI_ADDR_NOTAVAIL, but the gni provider does a reverse-lookup in the AV table, finds the fi_addr_t, and updates the VC with that address, the fi_cq_readfrom() function then returns the correct fi_addr_t for the client.   Subsequent messages from the client use the VC address directly, no further reverse-lookups in the AV table are needed.

The sockets provider should behave this same way, I believe.    Want me to open an issue?

Kevan

From: "Hefty, Sean" <sean.hefty at intel.com>
Date: Friday, April 19, 2019 at 10:39 AM
To: Kevan Rehm <krehm at cray.com>, "libfabric-users at lists.openfabrics.org" <libfabric-users at lists.openfabrics.org>
Subject: RE: trouble by FI_SOURCE use - revisited

Is this running over the ‘sockets’ provider?  If so, can you test with the ‘tcp’ provider instead?

From: Libfabric-users [mailto:libfabric-users-bounces at lists.openfabrics.org] On Behalf Of Kevan Rehm
Sent: Friday, April 19, 2019 7:49 AM
To: libfabric-users at lists.openfabrics.org
Subject: [libfabric-users] trouble by FI_SOURCE use - revisited

Greetings,

I noticed this recent message from someone having the same problem as I am with the sockets provider.

I believe I understand the problem now.

My original implementation was correct, but when the first message arrives, the receiving node does not have it's address in the address vector yet and so it reports FI_ADDR_NOTAVAIL.
I'm converting our bootstrap routine that used PMI on cray to work with sockets on other machines. I shall send an extra message first with only the socket information needed to the root node and it can then insert the correct address into the av, then things can resume as before (hopefully).

JB

Having tried what JB suggests above using the v1.7.x branch of libfabric, I can report that it doesn’t work.   I tried sending a client “hello” libfabric message containing the client’s name together with its raw address to my server.   The server adds the client’s raw address to its address vector, then creates a hash table entry where the key is the client’s fi_addr_t and the value is a struct identifying the client.  I use FI_SOURCE so that I can use the fi_addr_t from fi_cq_readfrom() to identify which client sends each message.   What happens is that every subsequent message from that client still reports FI_ADDR_NOTAVAIL, even though the client’s address is now in the server’s address vector.   Once a client sends a message on an endpoint, there is nothing in fi_av_insert() that goes back and updates existing client structs when a new address is inserted in the address vector.

This seems like a bug to me.   The person above had to previously implement PMI code to pass the client’s address to the server out-of-band for insertion into the server’s address vector before the client could initiate any libfabric communication.   The server had to be re-coded with PMI to receive and record these addresses.   This is not always possible.

In my case I have servers using libfabric that run persistently on Cray service nodes.   Client jobs start periodically and read a JSON file to get the addresses of those servers, together with a DRC token that gives them access to the servers’ communication domain.   Clients need to pass their names together with their raw addresses to the servers for insertion into the server address vectors so that FI_SOURCE works.  The servers are not part of any job, there is no PMI, MPI, or any other communication method available between clients and servers other than libfabric.

What I had to do as a workaround was to create a second endpoint in every client process, used only to send a “hello” message with the client’s name and the raw address of its main endpoint to the servers, so that those addresses were then already inserted in server address vectors when the first real message was sent on the main endpoint.   This seems overly complex to me, just to get FI_SOURCE working for a client.

Here’s a thought that I had.   Whenever the server calls fi_av_insert() or fi_av_insertsvc(), sockets code would walk through all existing client structs, and for each struct containing the address FI_ADDR_NOTAVAIL, set a bit that says “the address vector has been updated, maybe this client’s fi_addr_t is available now”.    The next time that a message arrives from the client, this flag is checked.   If set, the code searches the address vector for a match, one-time.   If a match is found, the fi_addr_t value replaces FI_ADDR_NOTAVAIL.   If no match is found, the new flag bit is cleared, so that a search does not have to be redone for every new message from that client.

I haven’t tried this, but I suspect that if I call fi_av_remove() to remove an address from the server’s address vector, that there is no code that walks the client structs, finds matches on the removed fi_addr_t, and sets the client’s address back to FI_ADDR_NOTAVAIL.   True?    If so, a similar loop as the above could take care of this problem as well.

Thoughts?

Thanks, Kevan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20190427/e049793a/attachment-0001.html>