[ofa-general] Re: openSM: Different IB MTUs
Sasha Khapyorsky
sashak at voltaire.com
Thu Jul 26 15:47:10 PDT 2007
On 09:00 Thu 26 Jul , Eitan Zahavi wrote:
> I propose that when there is no MTU in the partition policy file OpenSM
> use a
> configurable default from: /etc/cache/opensm/opensm.opt.
> Something like:
> # The default MTU to be used for IPoIB and other MCGs when the
> partition-policy
> # does not provide exact value. The default is the lowest possible MTU
> mcg_default_mtu 1
Looks like good solution for me.
Somebody cares about patch?
Sasha
>
> Eitan Zahavi
> Senior Engineering Director, Software Architect
> Mellanox Technologies LTD
> Tel:+972-4-9097208
> Fax:+972-4-9593245
> P.O. Box 586 Yokneam 20692 ISRAEL
>
>
>
> ________________________________
>
> From: Shirley Ma [mailto:xma at us.ibm.com]
> Sent: Wednesday, July 25, 2007 10:45 PM
> To: Eitan Zahavi
> Cc: general at lists.openfabrics.org; Hal Rosenstock
> Subject: RE: [ofa-general] Re: openSM: Different IB MTUs
>
>
>
> Hello Eitan, Hal,
>
> Thanks. It's good openSM has the configuration option to set up
> these attributes in MC. Is this a good idea to add below to openSM: When
> there is no MTU defined in the configuration file, SM can pick up the
> smallest link MTU in the fabrics by default? MTU is unlikely rate,
> slower rate might indicate the cablling problem. So using the smallest
> link MTU in the fabrics might not be a bad choice for MC by default. The
> reason I request here is to create IP multicast group, MTU is not an
> attribute of the group. When mapping IP multicast to IB multicast, IB
> muliticast might fail because of different IB link MTU size in the
> group, but IP multicast group will be successful without knowing the
> failure. If admin sets MTU in configuration file, admin would know this
> failure. Otherwise, admin/users could spend too much time on debugging
> their broken multicasting applications.
>
> Thanks
> Shirley Ma
>
> "Eitan Zahavi" <eitan at mellanox.co.il>
>
>
>
>
> "Eitan Zahavi" <eitan at mellanox.co.il>
>
> 07/25/07 12:25 PM
>
>
>
> To
>
> "Hal Rosenstock" <hal.rosenstock at gmail.com>, Shirley
> Ma/Beaverton/IBM at IBMUS
>
>
> cc
>
> <general at lists.openfabrics.org>
>
>
> Subject
>
> RE: [ofa-general] Re: openSM: Different IB MTUs
>
>
> Hi Shirley,
>
> I think I understand where your question comes from...
> Many have issue with heterogonous fabrics where not all nodes
> have same MTU or Speed.
> Especially when IPoIB relies on all nodes joining the broadcast
> group.
>
> The term "join" for multicast groups is a little overloaded.
> If a node joins an existing MC group it has to have a rate
> (speed * width) > MCG.rate and support MTU > MCG.MTU otherwise it is
> denied.
> If the join is actually a "create" the node has to provide the
> rate and MTU which define the MCG values.
>
> To allow for administrator to control the IPoIB MCGs MTU and
> rate OpenSM provides the means to control these
> values per partition. See the doc/partition-config.doc
> Still the administrator should know what would be the lowest MTU
> and rate the nodes expected to join the IPoIB subnet have.
> The tradeoff is in the hands of the administrator who can set a
> value that will prevent slow nodes from joining the group,
> or assign a low value that will fit all nodes but slow down
> communication ...
>
> EZ
>
> Eitan Zahavi
> Senior Engineering Director, Software Architect
> Mellanox Technologies LTD
> Tel:+972-4-9097208
> Fax:+972-4-9593245
> P.O. Box 586 Yokneam 20692 ISRAEL
>
>
>
>
> ________________________________
>
> From: general-bounces at lists.openfabrics.org [
> mailto:general-bounces at lists.openfabrics.org] On Behalf Of Hal
> Rosenstock
> Sent: Wednesday, July 25, 2007 10:01 PM
> To: Shirley Ma
> Cc: general at lists.openfabrics.org
> Subject: [ofa-general] Re: openSM: Different IB MTUs
>
> Shirley,
>
> On 7/25/07, Shirley Ma <xma at us.ibm.com <mailto:xma at us.ibm.com> >
> wrote:
>
> Hal,
>
> Thanks for your prompt reply. I am asking for how openSM
> handle different link MTUs in SA MCMemberRecord MTU. For example, if we
> have some links MTU as 2K, some links MTU as 1K. Then when enabling
> IPoIB, how does SM decide IPoIB broadcast group MCMemberRecord MTU size?
> When creating an IB multicast group from a 2K MTU node first, which PMTU
> value is attaching to this IB multicast group MCMemberRecord MTU?
>
>
>
> MCMemberRecord MTU gets the group MTU (when created). This is
> either this first joiner with sufficient components or preconfigured
> (and MTU can be set in the config). If a joiner has insufficient MTU for
> the group, it is denied.
>
> -- Hal
>
>
>
> Thanks
> Shirley Ma
>
> "Hal Rosenstock" < hal.rosenstock at gmail.com
> <mailto:hal.rosenstock at gmail.com> >
>
>
>
>
> "Hal Rosenstock" <
> hal.rosenstock at gmail.com <mailto:hal.rosenstock at gmail.com> >
>
> 07/25/07 10:57 AM
>
>
>
> To
>
> Shirley Ma/Beaverton/IBM at IBMUS
>
> cc
>
> general at lists.openfabrics.org <mailto:general at lists.openfabrics.org>
>
> Subject
>
> Re: openSM: Different IB MTUs
>
>
> Shirley,
>
> On 7/25/07, Shirley Ma < xma at us.ibm.com
> <mailto:xma at us.ibm.com> > wrote:
>
> Hello Hal,
>
> How does openSM handle CAs with
> different MTUs in the same subnet? For example, IPoIB broadcast group
> MTU, IB multicast group PMTU? Does openSM pick up the smallest MTU in
> the subnet?
>
>
>
> Are you asking about link MTU, SA
> PathRecord/MultiPathRecord MTU, SA MCMemberRecord MTU, or all of these ?
>
> -- Hal
>
> Thanks
> Shirley Ma
>
>
>
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
More information about the general
mailing list