[libfabric-users] Not receiving messages from other ranks

Hefty, Sean sean.hefty at intel.com
Wed Feb 10 11:27:57 PST 2021


> I am adding the receive endpoint address and when I dump the contents of the AV on all
> nodes, each have matching address lists.
> 
> One thing - if I have to bind the AV to the endpoint and then add some addresses later
> - does this matter - or does the binding take a copy? It worked ok when I had a common
> endpoint for send+recv so I don't see why it should make a difference having two
> endpoints.

Addresses can be added before/after bind.  It shouldn't make any difference.

> Also, I use the same address vector in both send and recv endpoints, I presume this is
> fine too.

Yes

> I'm baffled because I've spent a couple of days stepping through my code and can't see
> anything wrong.

Which provider are you using?  You may need to call cq read, even for the send side, to ensure progress is being driven.  If you're using rxm, I believe there's an environment variable you can set to force auto-progress.  ("fi_info -g rxm" might help discover the name.).

- Sean


More information about the Libfabric-users mailing list