[libfabric-users] trouble by FI_SOURCE use

Biddiscombe, John A. biddisco at cscs.ch
Sun Mar 10 06:53:13 PDT 2019


I believe I understand the problem now.

My original implementation was correct, but when the first message arrives, the receiving node does not have it's address in the address vector yet and so it reports FI_ADDR_NOTAVAIL.
I'm converting our bootstrap routine that used PMI on cray to work with sockets on other machines. I shall send an extra message first with only the socket information needed to the root node and it can then insert the correct address into the av, then things can resume as before (hopefully).

JB

________________________________
From: Libfabric-users [libfabric-users-bounces at lists.openfabrics.org] on behalf of Biddiscombe, John A. [biddisco at cscs.ch]
Sent: 10 March 2019 11:28
To: libfabric-users at lists.openfabrics.org
Subject: Re: [libfabric-users] trouble by FI_SOURCE use

I think I have the concept of source/dest the wrong way around and I should be using the dest field, since I want to receive on that port number.

Am I correct in thinking that when creating the endpoint, the source field is me if I'm sending, but the dest field if I'm receiving on a particular port?

thanks

JB
________________________________
From: Libfabric-users [libfabric-users-bounces at lists.openfabrics.org] on behalf of Biddiscombe, John A. [biddisco at cscs.ch]
Sent: 10 March 2019 10:21
To: libfabric-users at lists.openfabrics.org
Subject: [libfabric-users] trouble by FI_SOURCE use

Hello list

I need to use a particular port number when setting up a connection, so I have

fabric_hints_->src_addr    = socket_data;
fabric_hints_->addr_format = FI_SOCKADDR_IN;
fabric_hints_->src_addrlen = sizeof(struct sockaddr_in);
//
fabric_hints_->caps        = FI_MSG | FI_RMA | FI_SOURCE |
  FI_WRITE | FI_READ | FI_REMOTE_READ | FI_REMOTE_WRITE | FI_RMA_EVENT;

and all is well, but when a message comes in -
if (src_addr == FI_ADDR_NOTAVAIL)
{
    LOG_DEBUG_MSG("Source address not available...\n");
    std::terminate();
}

my check for the source fails. This is documented - however the documentation/examples appear ambiguous

"src_addr - source address
If specified, indicates the source address. This field will be ignored in hints if FI_SOURCE flag is set."

But the examples/tutorials use src_addr and still set FI_SOURCE, so I tried it too and the endpoint is created on the correct port number. I'm happy.
however,
"FI_SOURCE

Requests that the endpoint return source addressing data as part of its completion data."
So if I need the endpoint on a certain port number and I set src_addr, but now the FI_SOURCE flag is dropped, but I also want to know where messages are coming from - how do I get the source information with the message?

Apologies if this question appears trivial, I've not looked at the code for a long time and forgotten much of what I knew about libfabric.

yours

JB


--
Dr. John Biddiscombe,                    email:biddisco @.at.@ cscs.ch
http://www.cscs.ch/
CSCS, Swiss National Supercomputing Centre  | Tel:  +41 (91) 610.82.07
Via Trevano 131, 6900 Lugano, Switzerland   | Fax:  +41 (91) 610.82.82
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20190310/30aaa621/attachment.html>


More information about the Libfabric-users mailing list