[ofiwg] inserting duplicate addresses into an AV
michael.blocksome at intel.com
Tue Mar 20 11:28:46 PDT 2018
Which application, or which MPI, is inserting duplicate addresses? I don't see how MPI could be doing this. At least the MPI implementations I'm familiar with use PMI1, PMI2, or PMIx to exchange addresses at job startup into a distributed key-value store, and then after a barrier each MPI rank initializes its av with all these unique addresses. For a duplicate address to happen multiple MPI ranks would have to get the *same* local address from the OFI provider - how would that happen?
Some providers, like bgq, can stuff all the fabric address information within the 64 bits of fi_addr_t, which basically makes the fi_av_insert() call a noop in FI_AV_MAP mode. So if this duplicate address problem happened on bgq it would still "just work" from the provider's perspective. Now MPI (or whatever is using the provider) might get messed up because of it, but the fabric communication operations would still work.
From: ofiwg [mailto:ofiwg-bounces at lists.openfabrics.org] On Behalf Of Hefty, Sean
Sent: Tuesday, March 20, 2018 11:54 AM
To: ofiwg at lists.openfabrics.org
Subject: [ofiwg] inserting duplicate addresses into an AV
MPI is hitting into an issue that is the result of inserting the same address into an AV more than once. There is no defined behavior for what a provider should do in this case. At least one provider allows the duplicate insertion, and at least one fails the call... and neither work with MPI when this occurs. :/
There are a couple of problems trying to define this. In the case of the provider that fails the call, the failure is detected when attempting to insert the same address into a hash table. However, not all providers are easily able to detect duplicates. Forcing them to do so _may_ require the provider to perform a linear search over the AV looking for a duplicate for every address that is inserted. At scale, this is a significant overhead.
Even if the decision is made to force detecting duplicates (maybe even making this an AV option), there's the question of how a provider should respond. Should it insert the address twice -- creating a new fi_addr for it, discard the duplicate -- and return the existing fi_addr, or generate an error. And does it matter if AV_TABLE or MAP is used?
We need to know what applications need here, and how difficult it will be for providers to detect duplicates. It is apparently non-trivial for the apps to avoid duplicate insertions.
ofiwg mailing list
ofiwg at lists.openfabrics.org
More information about the ofiwg