[ofw] [RFC] Locally generated path records
Fab Tillier
ftillier at windows.microsoft.com
Tue Jul 15 12:49:03 PDT 2008
> From: Sean Hefty [mailto:sean.hefty at intel.com]
> Sent: Tuesday, July 15, 2008 12:39 PM
>
>> To create a path record, IPoIB needs the following values (in addition
>> to the ones it has access to for the AV creation):
>
> You don't need to create a PR, versus just creating the AV.
I do if I want to give it to the CM to establish an RC connection. Phase 2 (creating PRs) is specifically to avoid the PR lookup for users of IBAT. Basically, rather than returning a GID pair, IBAT would return a path record. IPoIB would create that path record. This effectively eliminates path queries, which should help things scale.
>> Reversible: Hard code to 1
>> NumbPath: Hard code to 1
> You shouldn't need these, at least for UD. Reversible is needed if you
> end up with IPoIB connected mode, but NumbPath is only used in a PR
> query. If you want to support any arbitrary topology with connected
> mode IPoIB, then you would need to know if a path is truly reversible,
> and potentially want to use different forward and reverse paths.
Come to think of it, this affects AV creation too - to create a local AV without getting the path record means that you assume that the path is reversible (otherwise you can't use the SLID of a received packet as DLID for a send packet, can you?)
>> PKey: Same as IPoIB port object
>> MTU: broadcast group
>> Rate: broadcast group
> The rate seems to be the only real limitation to me. In the worst
> case, you slow down traffic between a given pair of nodes, but at least
> things keep working. Avoiding the PR queries seems like a good idea to
> me, but it should probably be user configurable.
Looking at OpenSM, it always sets the rate to 12 (OSM_DEFAULT_SUBNET_TIMEOUT), both for MC groups as well as for path records.
>> Preference: 0
>
> Only used for PR queries.
Actually, spec says this is valid in a response. 0 means highest preference. It doesn't really matter since nothing will check this field.
> I'm not sure that most MPI apps will run through an IB router, so always
> querying for off subnet paths will probably be needed. (The current PR
> format only works for UD traffic between IB subnets anyway.)
I can trap that easily enough - if the subnet prefix is different, I can return a path that only has the GID/LID pairs filled in. The CM code can then detect if everything else is zero, and issue a real path query. While a bit convoluted, it avoids having to return PENDING from the IBAT library.
-Fab
More information about the ofw
mailing list