[libfabric-users] Endpoints revisited

Hefty, Sean sean.hefty at intel.com
Mon Jan 31 08:52:51 PST 2022


> I managed to break my code whilst trying to improve it - in a nutshell, I have one
> endpoint that is configured to receive and one that is for transmit. At startup all
> ranks share the Receive addresses so that they can fill the address vector, but when I
> use FI_SOURCE - a send from rank 0 appears to come from a different address than the
> receive address of rank 0 (because 2 endpoints not 1).
> 
> I had this problem before when I implemented threadlocal endpoints and somehow I got it
> working, but the latest version of the code appears munged.
> 
> Is there any way of having the transmit endpoint use the same address as the receive
> endpoint on each node - so that  when a message is received on Node B from Node A, it
> will appear to come from the same address as Node B uses when sending to Node A?

Different endpoints will have different transport level addresses.  To have the same address, they transmit and receive contexts need to belong to the same endpoint.  Is there a reason why you separated them?

I'm guessing you could carry a 16-bit source rank in the message, or maybe place that into the CQ data.  This adds a small amount of extra data to carry, but should be fairly easy as long as all ranks have mirroring AVs.

- Sean


More information about the Libfabric-users mailing list