[ewg] RE: Agenda for the OFED meeting today (Jan 5, 09)
jsquyres at cisco.com
Mon Jan 5 12:26:36 PST 2009
I chatted with John and Todd from QL on the phone today -- we
basically came to the same conclusion:
- need to beef-up opensm to be able to scalably handle lots of
incoming path record lookups
- need to beef-up the CM clients on the host (maybe; this work might
already be done?)
- need to see the current status of the SA caching stuff / re-open
that discussion to see if the work can be completed, etc.
It might also be worthwhile to start a whole new discussion about
making a better CM (at least from the ULP perspective). One that
offers simple mechanisms for those who don't need/care about the
details, but also offers complex/detailed mechanisms (perhaps
remarkably like today's mechanisms).
On Jan 5, 2009, at 2:01 PM, Davis, Arlin R wrote:
> There are scaling issues with SA path-record queries. We attempted
> to be good citizens with Intel MPI using the rdma_cm agent (via
> uDAPL) but was forced to build hard-coded RC QP support in OFED 1.4
> (uDAPL scm) to avoid the many scaling and configuration problems
> that came with IPoIB requirements, ARP storms, rdma_cm timers, and
> SA path record query/caching.
> If someone wants to sign up to design and implement a scalable SA
> query caching agent we would be happy to look at path record queries
> Another suggestion for 1.5
> Implementation of SA queries for Path Records (using IBTA 1.2.1
> ServiceId field) in all OFED ULPs, especially for MPI
> The IBTA standard defines that the proper way to
> establish a connection is to get a PathRecord from the SM/SA and use
> it to define all the attributes of the communication path.
> Ideally the IBTA CM should then be used to establish the connection
> and QPs as well.
> At present, openmpi, mvapich1 and mvapich2 do not use PathRecords,
> but instead hard code attributes like the PKey, SL, etc.
> In some cases these hardcoded values can be overridden by
> configurable values such as PKey and SL, but such values must be
> uniform across all connections and must be provided per job (which
> can be error prone/tedious).
> At present opensm supports PKeys and SLs, however MPI
> cannot easily use these features.
> Other features, such as lash routing, in opensm do not work properly
> with MPI because the SL must be uniform across all connections, but
> for lash it will vary per route.
> Additionally, applications which do not use PathRecords will have
> difficulties with advanced features like IB routing, partitioning,
> etc. All of which are available or being worked on in opensm.
> ewg mailing list
> ewg at lists.openfabrics.org
More information about the ewg