[libfabric-users] Not receiving messages from other ranks
Hefty, Sean
sean.hefty at intel.com
Wed Feb 10 11:27:57 PST 2021
> I am adding the receive endpoint address and when I dump the contents of the AV on all
> nodes, each have matching address lists.
>
> One thing - if I have to bind the AV to the endpoint and then add some addresses later
> - does this matter - or does the binding take a copy? It worked ok when I had a common
> endpoint for send+recv so I don't see why it should make a difference having two
> endpoints.
Addresses can be added before/after bind. It shouldn't make any difference.
> Also, I use the same address vector in both send and recv endpoints, I presume this is
> fine too.
Yes
> I'm baffled because I've spent a couple of days stepping through my code and can't see
> anything wrong.
Which provider are you using? You may need to call cq read, even for the send side, to ensure progress is being driven. If you're using rxm, I believe there's an environment variable you can set to force auto-progress. ("fi_info -g rxm" might help discover the name.).
- Sean
More information about the Libfabric-users
mailing list