[ewg] Re: [OMPI devel] [ofa-general] Re: OMPI over ofed udapl - bugs opened
Or Gerlitz
ogerlitz at voltaire.com
Thu May 10 05:23:14 PDT 2007
Jeff Squyres wrote:
> Galen Shipman and I talked about this a bit and suggest the following:
>
> - During the connection dance (probably for both the udapl and openib
> BTLs), whichever peer ends up being the connection initiator (don't
> forget about the race condition where 2 peers may simultaneously decide
> to initiate -- this case is handled properly in the OMPI code; but just
> make sure you modify the side that ends up being actual initiator), they
> can send their pending fragment immediately (and Steve is right that
> there will always be a pending fragment, because OMPI doesn't make a
> connection until the first send).
>
> - The other peer (the receiver of the connection) must wait to send its
> pending fragment(s) until it receives the first frag from the connection
> initiator. This can be accomplished either with another flag on the
> OMPI module struct or perhaps making it part of the connection protocol
> (i.e., don't transition the endpoint to be CONNECTED until the first
> fragment is received). Either of which can be used to queue up
> fragments on the receiver until the first fragment is received from the
> initiator. I'd have to look in the code deeper, but I'm *guessing* that
> it might be best to use the already-existing state flag (i.e., checking
> for CONNECTED) because then you won't be introducing any more
> conditionals in the critical path.
A different approach which you might want to consider is to have at the
btl level --two-- connections per <src,dst> ranks. so if A wants to send
B it does so through the A --> B connection and if B wants to send A it
does so through the B --> A connection. To some extent, this is the
approach taken by IPoIB-CM (I am not enough into the RFC to understand
the reasoning but i am quite sure this was the approach in the initial
implementation). At first thought it mights seems not very elegant, but
taking it into the details (projected on the ompi env) you might find it
even nice.
Or.
More information about the general
mailing list