[ewg] RE: Agenda for the OFED meeting today (Jan 5, 09)

Jeff Squyres jsquyres at cisco.com
Mon Jan 5 12:26:36 PST 2009


I chatted with John and Todd from QL on the phone today -- we  
basically came to the same conclusion:

- need to beef-up opensm to be able to scalably handle lots of  
incoming path record lookups
- need to beef-up the CM clients on the host (maybe; this work might  
already be done?)
- need to see the current status of the SA caching stuff / re-open  
that discussion to see if the work can be completed, etc.

It might also be worthwhile to start a whole new discussion about  
making a better CM (at least from the ULP perspective).  One that  
offers simple mechanisms for those who don't need/care about the  
details, but also offers complex/detailed mechanisms (perhaps  
remarkably like today's mechanisms).



On Jan 5, 2009, at 2:01 PM, Davis, Arlin R wrote:

>
> There are scaling issues with SA path-record queries. We attempted  
> to be good citizens with Intel MPI using the rdma_cm agent (via  
> uDAPL) but was forced to build hard-coded RC QP support in OFED 1.4  
> (uDAPL scm) to avoid the many scaling and configuration problems  
> that came with IPoIB requirements, ARP storms, rdma_cm timers, and  
> SA path record query/caching.
> If someone wants to sign up to design and implement a scalable SA  
> query caching agent we would be happy to look at path record queries  
> again.
>
> -arlin
>
> Another suggestion for 1.5
>
> Implementation of SA queries for Path Records (using IBTA 1.2.1  
> ServiceId field) in all OFED ULPs, especially for MPI
>             The IBTA standard defines that the proper way to  
> establish a connection is to get a PathRecord from the SM/SA and use  
> it to define all the attributes of the communication path.
> Ideally the IBTA CM should then be used to establish the connection  
> and QPs as well.
>
> At present, openmpi, mvapich1 and mvapich2 do not use PathRecords,  
> but instead hard code attributes like the PKey, SL, etc.
> In some cases these hardcoded values can be overridden by  
> configurable values such as PKey and SL, but such values must be  
> uniform across all connections and must be provided per job (which  
> can be error prone/tedious).
>
>             At present opensm supports PKeys and SLs, however MPI  
> cannot easily use these features.
> Other features, such as lash routing, in opensm do not work properly  
> with MPI because the SL must be uniform across all connections, but  
> for lash it will vary per route.
>
> Additionally, applications which do not use PathRecords will have  
> difficulties with advanced features like IB routing, partitioning,  
> etc.  All of which are available or being worked on in opensm.
>
>
> _______________________________________________
> ewg mailing list
> ewg at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


-- 
Jeff Squyres
Cisco Systems




More information about the ewg mailing list