[ofw] [RFC] Locally generated path records

Hal Rosenstock hal.rosenstock at gmail.com
Mon Jul 21 11:45:41 PDT 2008


On Tue, Jul 15, 2008 at 3:49 PM, Fab Tillier
<ftillier at windows.microsoft.com> wrote:
>> From: Sean Hefty [mailto:sean.hefty at intel.com]
>> Sent: Tuesday, July 15, 2008 12:39 PM
>>
>>> To create a path record, IPoIB needs the following values (in addition
>>> to the ones it has access to for the AV creation):
>>
>> You don't need to create a PR, versus just creating the AV.
>
> I do if I want to give it to the CM to establish an RC connection.  Phase 2 (creating PRs) is specifically to avoid the PR lookup for users of IBAT.  Basically, rather than returning a GID pair, IBAT would return a path record.  IPoIB would create that path record.  This effectively eliminates path queries, which should help things scale.
>
>>> Reversible: Hard code to 1
>>> NumbPath: Hard code to 1
>>  You shouldn't need these, at least for UD.  Reversible is needed if you
>> end up with IPoIB connected mode, but NumbPath is only used in a PR
>> query.  If you want to support any arbitrary topology with connected
>> mode IPoIB, then you would need to know if a path is truly reversible,
>> and potentially want to use different forward and reverse paths.
>
> Come to think of it, this affects AV creation too - to create a local AV without getting the path record means that you assume that the path is reversible (otherwise you can't use the SLID of a received packet as DLID for a send packet, can you?)
>
>>> PKey: Same as IPoIB port object
>>> MTU: broadcast group
>>> Rate: broadcast group
>>  The rate seems to be the only real limitation to me.  In the worst
>> case, you slow down traffic between a given pair of nodes, but at least
>> things keep working.  Avoiding the PR queries seems like a good idea to
>> me, but it should probably be user configurable.
>
> Looking at OpenSM, it always sets the rate to 12 (OSM_DEFAULT_SUBNET_TIMEOUT), both for MC groups as well as for path records.

That default value can be set via configuration. Also, it may be the
way things are now with OpenSM but is not what is allowed by the spec.
There may be SMs out there that do more sophisticated and better
filling in of this component (or OpenSM may change for this in the
future).

-- Hal

>
>>> Preference: 0
>>
>> Only used for PR queries.
>
> Actually, spec says this is valid in a response.  0 means highest preference.  It doesn't really matter since nothing will check this field.
>
>> I'm not sure that most MPI apps will run through an IB router, so always
>> querying for off subnet paths will probably be needed.  (The current PR
>> format only works for UD traffic between IB subnets anyway.)
>
> I can trap that easily enough - if the subnet prefix is different, I can return a path that only has the GID/LID pairs filled in.  The CM code can then detect if everything else is zero, and issue a real path query.  While a bit convoluted, it avoids having to return PENDING from the IBAT library.
>
> -Fab
> _______________________________________________
> ofw mailing list
> ofw at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
>



More information about the ofw mailing list