[ofa-general] opensm routing
Jeff Becker
Jeffrey.C.Becker at nasa.gov
Thu Jun 12 10:11:50 PDT 2008
Hi Al
Al Chu wrote:
> Hey Jeff,
>
> On Wed, 2008-06-11 at 09:43 -0700, Jeff Becker wrote:
>
>> Basically, we have an Altix ICE cluster connected by a pair of hypercube
>> Infiniband fabrics. External to that, we have some Lustre nodes
>> connected into the cluster with Infiniband. Our goal is to keep Lustre
>> traffic separate from compute (MPI) traffic. Ideally, we'd have 2
>> subnets and an IB router between the Lustre fabric and the compute
>> fabric to accomplish this.
>>
>
> I see. In your environment, the lustre storage servers are on the same
> fabric as your compute nodes?
>
Right.
>
>> Barring that, I thought we could use partitions as follows: compute
>> HCA's and switch ports are on both partitions with full membership in
>> compute partition, and limited membership in I/O partition. The Lustre
>> nodes and switches would only be in the I/O partition (full
>> membership). That way, inter compute node (MPI) traffic would be
>> disallowed from using routes through the I/O fabric (by partition
>> membership), and I/O traffic could not interfere with compute (via
>> separate partitions). Is this scheme feasible?
>>
>> If that's not possible, the next idea is to modify OpenSM to assign
>> large weights to the links between the compute and I/O fabrics, so that
>> the MinHop algorithm would never consider using these links for
>> inter-compute node traffic.
>>
>
> So dedicating (for example) X out of Y uplinks for MPI only and the
> remaining uplinks for lustre only?
>
That works. The compute nodes need to talk to other compute nodes for
MPI over one set of links, and they need to talk to the Lustre nodes for
I/O, but over a different (disjoint) set of links. Thanks.
-jeff
> Al
>
>
>> Thoughts? Thanks.
>>
>> -jeff
>>
>> Al Chu wrote:
>>
>>> Hey Jeff,
>>>
>>> Out of my curiosity, are you just trying to change the routing to
>>> improve job performance? i.e. lustre nodes get special routing vs.
>>> compute nodes?
>>>
>>> Al
>>>
>>> On Tue, 2008-06-10 at 15:08 -0700, Jeff Becker wrote:
>>>
>>>
>>>> Hi all. I was looking into doing some subnet partitioning to separate
>>>> compute nodes from Lustre nodes, and I saw the following in
>>>> ~sashak/management.git on the OFA server, in opensm/doc/OpenSM_PKey_Mgr.txt
>>>>
>>>> OpenSM Partition Management
>>>> ---------------------------
>>>>
>>>> Roadmap:
>>>> Phase 1 - provide partition management at the EndPort (HCA, Router and Switch
>>>> Port 0) level with no routing affects.
>>>> Phase 2 - routing engine should take partitions into account.
>>>> ...
>>>> Phase 2 functionality:
>>>>
>>>> The partition policy should be considered during the routing such that
>>>> links are associated with particular partition or a set of
>>>> partitions. Policy should be enhanced to provide hints for how to do
>>>> that (correlating to QoS too). The exact algorithm is TBD.
>>>>
>>>>
>>>> What is the status of Pkey-aware routing? Thanks.
>>>>
>>>> -jeff
>>>>
>>>> _______________________________________________
>>>> general mailing list
>>>> general at lists.openfabrics.org
>>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>>>
>>>> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>>>>
>>>>
More information about the general
mailing list