[ofiwg] inserting duplicate addresses into an AV

James Swaro jswaro at cray.com
Tue Mar 20 12:15:54 PDT 2018


Isn't it possible for an MPI spawn on the same node, with the sockets provider (or something with a reasonably simple addressing scheme) to give a duplicate address? 

-- Jim
 
´╗┐On 3/20/18, 2:07 PM, "ofiwg on behalf of Blocksome, Michael" <ofiwg-bounces at lists.openfabrics.org on behalf of michael.blocksome at intel.com> wrote:

    ew .. MPI spawn.
    
    Again, how can a rank - even if it is yet to be attached to a MPI job - get the same fabric endpoint address from its OFI provider as some other rank in the system? Is this spawn test doing something crazy like attach-detach-attach-detach-etc and a previous address is not being removed properly before the next (same) address is inserted again?
    
    I guess I don't understand the intricacies of this MPI spawn problem, and it's difficult for me to believe the statement "It is apparently non-trivial for the apps to avoid duplicate insertions" without this understanding. But, to me, this seems like applications/middleware just shouldn't be inserting a fabric endpoint address twice ... at least for HPC/MPI anyway. But maybe this duplicate insert scenario can still happen in a data center environment?
    
    -----Original Message-----
    From: Hefty, Sean 
    Sent: Tuesday, March 20, 2018 1:38 PM
    To: Blocksome, Michael <michael.blocksome at intel.com>; ofiwg at lists.openfabrics.org
    Subject: RE: inserting duplicate addresses into an AV
    
    The failures are related to MPI spawn tests.  This happens with Intel MPI, but I suspect MPICH or other MPIs may have similar problems with this test.
    
    
    > -----Original Message-----
    > From: Blocksome, Michael
    > Sent: Tuesday, March 20, 2018 11:29 AM
    > To: Hefty, Sean <sean.hefty at intel.com>; ofiwg at lists.openfabrics.org
    > Subject: RE: inserting duplicate addresses into an AV
    > 
    > Which application, or which MPI, is inserting duplicate addresses? I
    > don't see how MPI could be doing this. At least the MPI
    > implementations I'm familiar with use PMI1, PMI2, or PMIx to exchange
    > addresses at job startup into a distributed key-value store, and then
    > after a barrier each MPI rank initializes its av with all these unique
    > addresses. For a duplicate address to happen multiple MPI ranks would
    > have to get the *same* local address from the OFI provider - how would
    > that happen?
    > 
    > Some providers, like bgq, can stuff all the fabric address information
    > within the 64 bits of fi_addr_t, which basically makes the
    > fi_av_insert() call a noop in FI_AV_MAP mode. So if this duplicate
    > address problem happened on bgq it would still "just work" from the
    > provider's perspective. Now MPI (or whatever is using the provider)
    > might get messed up because of it, but the fabric communication
    > operations would still work.
    > 
    > Mike
    > 
    > -----Original Message-----
    > From: ofiwg [mailto:ofiwg-bounces at lists.openfabrics.org] On Behalf Of
    > Hefty, Sean
    > Sent: Tuesday, March 20, 2018 11:54 AM
    > To: ofiwg at lists.openfabrics.org
    > Subject: [ofiwg] inserting duplicate addresses into an AV
    > 
    > MPI is hitting into an issue that is the result of inserting the same
    > address into an AV more than once.  There is no defined behavior for
    > what a provider should do in this case.  At least one provider allows
    > the duplicate insertion, and at least one fails the call... and
    > neither work with MPI when this occurs.  :/
    > 
    > There are a couple of problems trying to define this.  In the case of
    > the provider that fails the call, the failure is detected when
    > attempting to insert the same address into a hash table.  However, not
    > all providers are easily able to detect duplicates.  Forcing them to
    > do so _may_ require the provider to perform a linear search over the
    > AV looking for a duplicate for every address that is inserted.  At
    > scale, this is a significant overhead.
    > 
    > Even if the decision is made to force detecting duplicates (maybe even
    > making this an AV option), there's the question of how a provider
    > should respond.  Should it insert the address twice -- creating a new
    > fi_addr for it, discard the duplicate -- and return the existing
    > fi_addr, or generate an error.  And does it matter if AV_TABLE or MAP
    > is used?
    > 
    > We need to know what applications need here, and how difficult it will
    > be for providers to detect duplicates.  It is apparently non-trivial
    > for the apps to avoid duplicate insertions.
    > 
    > - Sean
    > 
    > _______________________________________________
    > ofiwg mailing list
    > ofiwg at lists.openfabrics.org
    > http://lists.openfabrics.org/mailman/listinfo/ofiwg
    _______________________________________________
    ofiwg mailing list
    ofiwg at lists.openfabrics.org
    http://lists.openfabrics.org/mailman/listinfo/ofiwg
    



More information about the ofiwg mailing list