<html dir="ltr">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body style="text-align:left; direction:ltr;">
<div>On Mon, 2019-09-16 at 00:01 +0000, Kevan Rehm wrote:</div>
<div>> Rob,</div>
<div>> </div>
<div>> Hmmm, if your config.h file contains "#define HAVE_KDREG 0", then</div>
<div>> the only other way that I can find that would take your program</div>
<div>> through the code that returns urc=2 is if your application is</div>
<div>> deliberately setting the GNI_MR_CACHE_LAZY_DEREG gni domain variable</div>
<div>> to 1 at runtime sometime just after opening the domain. Could you</div>
<div>> scan your code, see if you get a hit on this symbol?</div>
<div><br>
</div>
<div>Hah, well look at that. We have indeed been setting this for a couple of years:</div>
<div><br>
</div>
<div><a href="https://github.com/mercury-hpc/mercury/blob/master/src/na/na_ofi.c#L1808">https://github.com/mercury-hpc/mercury/blob/master/src/na/na_ofi.c#L1808</a></div>
<div><br>
</div>
<div>I was kind of surprised to see a GNI setting had bubbled up through two abstraction layers, but thanks for the suggestion.</div>
<div><br>
</div>
<div>I must have a defective mental model for what's going on here. I did some experimenting and it turns out I only had to cut the value of udreg_reg_limit in half (1024) in order to get 64 process per node (up to three nodes so far) working.</div>
<div><br>
</div>
<div>Hopefuly this will be the last time you hear from me as I try to scale things up further. I appreciate the the quick responses so far.</div>
<div><br>
</div>
<div>==rob</div>
<div><br>
</div>
</body>
</html>