[ofw] [RFC] Locally generated path records

Fab Tillier ftillier at windows.microsoft.com
Tue Jul 15 12:49:03 PDT 2008


> From: Sean Hefty [mailto:sean.hefty at intel.com]
> Sent: Tuesday, July 15, 2008 12:39 PM
>
>> To create a path record, IPoIB needs the following values (in addition
>> to the ones it has access to for the AV creation):
>
> You don't need to create a PR, versus just creating the AV.

I do if I want to give it to the CM to establish an RC connection.  Phase 2 (creating PRs) is specifically to avoid the PR lookup for users of IBAT.  Basically, rather than returning a GID pair, IBAT would return a path record.  IPoIB would create that path record.  This effectively eliminates path queries, which should help things scale.

>> Reversible: Hard code to 1
>> NumbPath: Hard code to 1
>  You shouldn't need these, at least for UD.  Reversible is needed if you
> end up with IPoIB connected mode, but NumbPath is only used in a PR
> query.  If you want to support any arbitrary topology with connected
> mode IPoIB, then you would need to know if a path is truly reversible,
> and potentially want to use different forward and reverse paths.

Come to think of it, this affects AV creation too - to create a local AV without getting the path record means that you assume that the path is reversible (otherwise you can't use the SLID of a received packet as DLID for a send packet, can you?)

>> PKey: Same as IPoIB port object
>> MTU: broadcast group
>> Rate: broadcast group
>  The rate seems to be the only real limitation to me.  In the worst
> case, you slow down traffic between a given pair of nodes, but at least
> things keep working.  Avoiding the PR queries seems like a good idea to
> me, but it should probably be user configurable.

Looking at OpenSM, it always sets the rate to 12 (OSM_DEFAULT_SUBNET_TIMEOUT), both for MC groups as well as for path records.

>> Preference: 0
>
> Only used for PR queries.

Actually, spec says this is valid in a response.  0 means highest preference.  It doesn't really matter since nothing will check this field.

> I'm not sure that most MPI apps will run through an IB router, so always
> querying for off subnet paths will probably be needed.  (The current PR
> format only works for UD traffic between IB subnets anyway.)

I can trap that easily enough - if the subnet prefix is different, I can return a path that only has the GID/LID pairs filled in.  The CM code can then detect if everything else is zero, and issue a real path query.  While a bit convoluted, it avoids having to return PENDING from the IBAT library.

-Fab



More information about the ofw mailing list