[ofa-general] Re: openSM: Different IB MTUs

Sasha Khapyorsky sashak at voltaire.com
Thu Jul 26 15:50:05 PDT 2007


On 08:58 Thu 26 Jul     , Hal Rosenstock wrote:
> On 7/26/07, Eitan Zahavi <eitan at mellanox.co.il> wrote:
> >
> >  *I propose that when there is no MTU in the partition policy file
> > OpenSM use a *
> > *configurable default from: **/etc/cache/opensm/opensm.opt.*
> >
> 
> That would make this the default rather than 2K. IMO it should be when some
> "special" unused mtu is set in the partition config.

"No value" should suitable too. No?

Sasha

> 
> -- Hal
> 
>  *Something like:*
> > *# The default MTU to be used for IPoIB and other MCGs when the
> > partition-policy *
> > *# does not provide exact value. The default is the lowest possible MTU*
> > *mcg_default_mtu 1*
> > **
> > *Eitan Zahavi***
> > Senior Engineering Director, Software Architect
> > Mellanox Technologies LTD
> > Tel:+972-4-9097208
> > Fax:+972-4-9593245
> > P.O. Box 586 Yokneam 20692 ISRAEL
> >
> >
> >  ------------------------------
> > *From:* Shirley Ma [mailto:xma at us.ibm.com]
> > *Sent:* Wednesday, July 25, 2007 10:45 PM
> > *To:* Eitan Zahavi
> > *Cc:* general at lists.openfabrics.org; Hal Rosenstock
> > *Subject:* RE: [ofa-general] Re: openSM: Different IB MTUs
> >
> >
> >
> > Hello Eitan, Hal,
> >
> > Thanks. It's good openSM has the configuration option to set up these
> > attributes in MC. Is this a good idea to add below to openSM: When there is
> > no MTU defined in the configuration file, SM can pick up the smallest link
> > MTU in the fabrics by default? MTU is unlikely rate, slower rate might
> > indicate the cablling problem. So using the smallest link MTU in the fabrics
> > might not be a bad choice for MC by default. The reason I request here is to
> > create IP multicast group, MTU is not an attribute of the group. When
> > mapping IP multicast to IB multicast, IB muliticast might fail because of
> > different IB link MTU size in the group, but IP multicast group will be
> > successful without knowing the failure. If admin sets MTU in configuration
> > file, admin would know this failure. Otherwise, admin/users could spend too
> > much time on debugging their broken multicasting applications.
> >
> > Thanks
> > Shirley Ma
> >
> > [image: Inactive hide details for "Eitan Zahavi" <eitan at mellanox.co.il>]"Eitan
> > Zahavi" <eitan at mellanox.co.il>
> >
> >
> >
> >     *"Eitan Zahavi" <eitan at mellanox.co.il>*
> >
> >             07/25/07 12:25 PM
> >
> >
> > To
> >
> > "Hal Rosenstock" <hal.rosenstock at gmail.com>, Shirley
> > Ma/Beaverton/IBM at IBMUS
> > cc
> >
> > <general at lists.openfabrics.org>
> > Subject
> >
> > RE: [ofa-general] Re: openSM: Different IB MTUs
> > *Hi Shirley,*
> >
> > *I think I understand where your question comes from...*
> > *Many have issue with heterogonous fabrics where not all nodes have same
> > MTU or Speed.*
> > *Especially when IPoIB relies on all nodes joining the broadcast group.*
> >
> > *The term "join" for multicast groups is a little overloaded.*
> > *If a node joins an existing MC group it has to have a rate (speed *
> > width) > MCG.rate and support MTU > MCG.MTU otherwise it is denied.*
> > *If the join is actually a "create" the node has to provide the rate and
> > MTU which define the MCG values.*
> >
> > *To allow for administrator to control the IPoIB MCGs MTU and rate OpenSM
> > provides the means to control these*
> > *values per partition. See the doc/partition-config.doc*
> > *Still the administrator should know what would be the lowest MTU and rate
> > the nodes expected to join the IPoIB subnet have.*
> > *The tradeoff is in the hands of the administrator who can set a value
> > that will prevent slow nodes from joining the group, *
> > *or assign a low value that will fit all nodes but slow down communication
> > ...*
> >
> > *EZ*
> >
> > *Eitan Zahavi*
> > Senior Engineering Director, Software Architect
> > Mellanox Technologies LTD
> > Tel:+972-4-9097208
> > Fax:+972-4-9593245
> > P.O. Box 586 Yokneam 20692 ISRAEL
> >
> >
> >
> > ------------------------------
> > *From:* general-bounces at lists.openfabrics.org [
> > mailto:general-bounces at lists.openfabrics.org<general-bounces at lists.openfabrics.org>]
> > *On Behalf Of *Hal Rosenstock*
> > Sent:* Wednesday, July 25, 2007 10:01 PM*
> > To:* Shirley Ma*
> > Cc:* general at lists.openfabrics.org*
> > Subject:* [ofa-general] Re: openSM: Different IB MTUs
> >
> > Shirley,
> >
> > On 7/25/07, *Shirley Ma* <*xma at us.ibm.com* <xma at us.ibm.com>> wrote:
> >
> >    Hal,
> >
> >    Thanks for your prompt reply. I am asking for how openSM handle
> >    different link MTUs in SA MCMemberRecord MTU. For example, if we have some
> >    links MTU as 2K, some links MTU as 1K. Then when enabling IPoIB, how does SM
> >    decide IPoIB broadcast group MCMemberRecord MTU size? When creating an IB
> >    multicast group from a 2K MTU node first, which PMTU value is attaching to
> >    this IB multicast group MCMemberRecord MTU?
> >
> >
> >
> > MCMemberRecord MTU gets the group MTU (when created). This is either this
> > first joiner with sufficient components or preconfigured (and MTU can be set
> > in the config). If a joiner has insufficient MTU for the group, it is
> > denied.
> >
> > -- Hal
> >
> >
> >    Thanks
> >    Shirley Ma
> >
> >    [image: Inactive hide details for "Hal Rosenstock"
> >    <hal.rosenstock at gmail.com>]"Hal Rosenstock" < *
> >    hal.rosenstock at gmail.com* <hal.rosenstock at gmail.com>>
> >
> >          *"Hal Rosenstock" <**hal.rosenstock at gmail.com*<hal.rosenstock at gmail.com>
> >                            *>*
> >
> >                            07/25/07 10:57 AM
> >                               To
> >
> >    Shirley Ma/Beaverton/IBM at IBMUS  cc
> >    *
> >    **general at lists.openfabrics.org* <general at lists.openfabrics.org>
> >    Subject
> >
> >    Re: openSM: Different IB MTUs
> >    Shirley,
> >
> >    On 7/25/07, *Shirley Ma* <* **xma at us.ibm.com* <xma at us.ibm.com>>
> >    wrote:
> >       Hello Hal,
> >
> >          How does openSM handle CAs with different MTUs in the
> >          same subnet? For example, IPoIB broadcast group MTU, IB multicast group
> >          PMTU? Does openSM pick up the smallest MTU in the subnet?
> >
> >
> >    Are you asking about link MTU, SA PathRecord/MultiPathRecord MTU, SA
> >    MCMemberRecord MTU, or all of these ?
> >
> >    -- Hal
> >       Thanks
> >          Shirley Ma
> >
> >
> >
> >






> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



More information about the general mailing list