[libfabric-users] gni assertion
Biddiscombe, John A.
biddisco at cscs.ch
Fri Apr 14 05:54:17 PDT 2017
This looks like heap corruption somehow. Could you try rebuilding libfabric
with --enable-debug and set FI_LOG_LEVEL to warn and see if that gives
After making some changes to our code, the problem seems to have gone away. We limit the number of messages that a node can send at a time to a smaller number and things seem to behave better.
A more general question though:
Does libfabric have any flow control mechanism built in? If I send a large number of message from many nodes to one single node - once the preposted receives are exhausted - what behaviour can I expect from libfabric. Will messages be resent - or will the network layer transition into an error state from which it is difficult to recover.
Experiments indicate that libfabric is handling large numbers of messages without returning errors, but I’m curious to know what knobs/controls exist to allow us to adjust behaviour and mange flow control.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Libfabric-users