[libfabric-users] troubled by FI_SOURCE use

Biddiscombe, John A. biddisco at cscs.ch
Tue Mar 12 15:27:00 PDT 2019


Question:
Is it possible to setup a connection using only a connectionless endpoint at both ends?

Using a single node (for testing) I'm doing this
process 0 :  create an rdm connection on one end using src = 127.0.0.1:11111 and dst = null, and all appears fine.
process 1: create an connection using dst = 127.0.0.1:11111 and src=null, all appears fine. After creating the endpoint, I can see that this end has been assigned a port number of 22222.
process 1 : creates an address vector with its own address in it and the address of process 0 : 127.0.0.1:11111 in it. sends it's own address 127.0.0.1:22222 to process 0 which receives it correctly (with the ADDR NOT AVAIL flag set)
Process 0 now adds the address 127:0.0.1:22222 to its address vector and sends a 'hello' message to process 1:
process 1 does not receive this message. But process 0 does receive it.

Why is the message going from process 0 back to process 0 and not to process 1?

Thanks

JB




________________________________
From: Libfabric-users [libfabric-users-bounces at lists.openfabrics.org] on behalf of Biddiscombe, John A. [biddisco at cscs.ch]
Sent: 10 March 2019 14:53
To: libfabric-users at lists.openfabrics.org
Subject: Re: [libfabric-users] trouble by FI_SOURCE use

I believe I understand the problem now.

My original implementation was correct, but when the first message arrives, the receiving node does not have it's address in the address vector yet and so it reports FI_ADDR_NOTAVAIL.
I'm converting our bootstrap routine that used PMI on cray to work with sockets on other machines. I shall send an extra message first with only the socket information needed to the root node and it can then insert the correct address into the av, then things can resume as before (hopefully).

JB

________________________________
From: Libfabric-users [libfabric-users-bounces at lists.openfabrics.org] on behalf of Biddiscombe, John A. [biddisco at cscs.ch]
Sent: 10 March 2019 11:28
To: libfabric-users at lists.openfabrics.org
Subject: Re: [libfabric-users] trouble by FI_SOURCE use

I think I have the concept of source/dest the wrong way around and I should be using the dest field, since I want to receive on that port number.

Am I correct in thinking that when creating the endpoint, the source field is me if I'm sending, but the dest field if I'm receiving on a particular port?

thanks

JB
________________________________
From: Libfabric-users [libfabric-users-bounces at lists.openfabrics.org] on behalf of Biddiscombe, John A. [biddisco at cscs.ch]
Sent: 10 March 2019 10:21
To: libfabric-users at lists.openfabrics.org
Subject: [libfabric-users] trouble by FI_SOURCE use

Hello list

I need to use a particular port number when setting up a connection, so I have

fabric_hints_->src_addr    = socket_data;
fabric_hints_->addr_format = FI_SOCKADDR_IN;
fabric_hints_->src_addrlen = sizeof(struct sockaddr_in);
//
fabric_hints_->caps        = FI_MSG | FI_RMA | FI_SOURCE |
  FI_WRITE | FI_READ | FI_REMOTE_READ | FI_REMOTE_WRITE | FI_RMA_EVENT;

and all is well, but when a message comes in -
if (src_addr == FI_ADDR_NOTAVAIL)
{
    LOG_DEBUG_MSG("Source address not available...\n");
    std::terminate();
}

my check for the source fails. This is documented - however the documentation/examples appear ambiguous

"src_addr - source address
If specified, indicates the source address. This field will be ignored in hints if FI_SOURCE flag is set."

But the examples/tutorials use src_addr and still set FI_SOURCE, so I tried it too and the endpoint is created on the correct port number. I'm happy.
however,
"FI_SOURCE

Requests that the endpoint return source addressing data as part of its completion data."
So if I need the endpoint on a certain port number and I set src_addr, but now the FI_SOURCE flag is dropped, but I also want to know where messages are coming from - how do I get the source information with the message?

Apologies if this question appears trivial, I've not looked at the code for a long time and forgotten much of what I knew about libfabric.

yours

JB


--
Dr. John Biddiscombe,                    email:biddisco @.at.@ cscs.ch
http://www.cscs.ch/
CSCS, Swiss National Supercomputing Centre  | Tel:  +41 (91) 610.82.07
Via Trevano 131, 6900 Lugano, Switzerland   | Fax:  +41 (91) 610.82.82
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20190312/063736a4/attachment.html>


More information about the Libfabric-users mailing list