[ofa-general] Re: Re: openSM: Different IB MTUs

Shirley Ma xma at us.ibm.com
Thu Jul 26 07:58:21 PDT 2007






Set default as 4 (2K) is more proper than 1(512?). All HCAs support 2K at
least now.

Thanks
Shirley Ma




                                                                           
             "Michael S.                                                   
             Tsirkin"                                                      
             <mst at dev.mellanox                                          To 
             .co.il>                   Shirley Ma/Beaverton/IBM at IBMUS      
                                                                        cc 
             07/26/07 12:22 AM         Eitan Zahavi                        
                                       <eitan at mellanox.co.il>,             
                                       general at lists.openfabrics.org       
             Please respond to                                     Subject 
                "Michael S.            Re: Re: openSM: Different IB MTUs   
                 Tsirkin"                                                  
             <mst at dev.mellanox                                             
                  .co.il>                                                  
                                                                           
                                                                           
                                                                           




What does "1" mean? Surely not 1 byte MTU :)
IMO a good format would be the MTU value in bytes.
E.g. 512, 1024, 2048, 4096.

Quoting Shirley Ma <xma at us.ibm.com>:
Subject: RE: Re: openSM: Different IB MTUs

Eitan,

That's a good approach to address the issue.

thanks
Shirley Ma

Inactive hide details for "Eitan Zahavi" <eitan at mellanox.co.il>"Eitan
Zahavi"
<eitan at mellanox.co.il>


                "Eitan Zahavi"         [cid]   *
                <eitan at mellanox.co.il>      To Shirley
Ma/Beaverton/IBM at IBMUS
                                       [cid]   *
                07/25/07 11:00 PM           cc
<general at lists.openfabrics.org>, "Hal Rosenstock"
                                               <hal.rosenstock at gmail.com>
                                       [cid]   *
                                       Subject RE: [ofa-general] Re:
openSM: Different IB MTUs
                                       *        *

I propose that when there is no MTU in the partition policy file OpenSM use
a
configurable default from: /etc/cache/opensm/opensm.opt.
Something like:
# The default MTU to be used for IPoIB and other MCGs when the
partition-policy
# does not provide exact value. The default is the lowest possible MTU
mcg_default_mtu 1

Eitan Zahavi
Senior Engineering Director, Software Architect
Mellanox Technologies LTD
Tel:+972-4-9097208
Fax:+972-4-9593245
P.O. Box 586 Yokneam 20692 ISRAEL


━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

From: Shirley Ma [mailto:xma at us.ibm.com]
Sent: Wednesday, July 25, 2007 10:45 PM
To: Eitan Zahavi
Cc: general at lists.openfabrics.org; Hal Rosenstock
Subject: RE: [ofa-general] Re: openSM: Different IB MTUs

Hello Eitan, Hal,

Thanks. It's good openSM has the configuration option to set up these
attributes in MC. Is this a good idea to add below to openSM: When there is
no
MTU defined in the configuration file, SM can pick up the smallest link MTU
in
the fabrics by default? MTU is unlikely rate, slower rate might indicate
the
cablling problem. So using the smallest link MTU in the fabrics might not
be a
bad choice for MC by default. The reason I request here is to create IP
multicast group, MTU is not an attribute of the group. When mapping IP
multicast to IB multicast, IB muliticast might fail because of different IB
link MTU size in the group, but IP multicast group will be successful
without
knowing the failure. If admin sets MTU in configuration file, admin would
know
this failure. Otherwise, admin/users could spend too much time on debugging
their broken multicasting applications.

Thanks
Shirley Ma

Inactive hide details for "Eitan Zahavi" <eitan at mellanox.co.il>"Eitan
Zahavi"
<eitan at mellanox.co.il>

                                "Eitan Zahavi"         [cid]   *
                                <eitan at mellanox.co.il>      To "Hal
Rosenstock"

<hal.rosenstock at gmail.com>, Shirley
                                07/25/07 12:25 PM
Ma/Beaverton/IBM at IBMUS
                                                       [cid]   *
                                                            cc
<general at lists.openfabrics.org>
                                                       [cid]   *
                                                       Subject RE:
[ofa-general] Re: openSM:
                                                               Different IB
MTUs
                                                       *       *

Hi Shirley,

I think I understand where your question comes from...
Many have issue with heterogonous fabrics where not all nodes have same MTU
or
Speed.
Especially when IPoIB relies on all nodes joining the broadcast group.

The term "join" for multicast groups is a little overloaded.
If a node joins an existing MC group it has to have a rate (speed * width)
>
MCG.rate and support MTU > MCG.MTU otherwise it is denied.
If the join is actually a "create" the node has to provide the rate and MTU
which define the MCG values.

To allow for administrator to control the IPoIB MCGs MTU and rate OpenSM
provides the means to control these
values per partition. See the doc/partition-config.doc
Still the administrator should know what would be the lowest MTU and rate
the
nodes expected to join the IPoIB subnet have.
The tradeoff is in the hands of the administrator who can set a value that
will
prevent slow nodes from joining the group,
or assign a low value that will fit all nodes but slow down communication
...

EZ

Eitan Zahavi
Senior Engineering Director, Software Architect
Mellanox Technologies LTD
Tel:+972-4-9097208
Fax:+972-4-9593245
P.O. Box 586 Yokneam 20692 ISRAEL



━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

From: general-bounces at lists.openfabrics.org [
mailto:general-bounces at lists.openfabrics.org] On Behalf Of Hal Rosenstock
Sent: Wednesday, July 25, 2007 10:01 PM
To: Shirley Ma
Cc: general at lists.openfabrics.org
Subject: [ofa-general] Re: openSM: Different IB MTUs

Shirley,

On 7/25/07, Shirley Ma <xma at us.ibm.com> wrote:

        Hal,

        Thanks for your prompt reply. I am asking for how openSM handle
        different link MTUs in SA MCMemberRecord MTU. For example, if we
have
        some links MTU as 2K, some links MTU as 1K. Then when enabling
IPoIB,
        how does SM decide IPoIB broadcast group MCMemberRecord MTU size?
When
        creating an IB multicast group from a 2K MTU node first, which PMTU
        value is attaching to this IB multicast group MCMemberRecord MTU?



MCMemberRecord MTU gets the group MTU (when created). This is either this
first
joiner with sufficient components or preconfigured (and MTU can be set in
the
config). If a joiner has insufficient MTU for the group, it is denied.

-- Hal

        Thanks
        Shirley Ma

        Inactive hide details for "Hal Rosenstock"
<hal.rosenstock at gmail.com>
        "Hal Rosenstock" < hal.rosenstock at gmail.com>
                                                "Hal Rosenstock" <
[cid]   *
                                                hal.rosenstock at gmail.com>
To Shirley Ma/Beaverton/
IBM at IBMUS
                                                07/25/07 10:57 AM
[cid]   *
cc general at lists.openfabrics.org
[cid]   *
Subject Re: openSM: Different IB MTUs
*                  *

        Shirley,

        On 7/25/07, Shirley Ma < xma at us.ibm.com> wrote:
                        Hello Hal,

                        How does openSM handle CAs with different MTUs in
the
                        same subnet? For example, IPoIB broadcast group
MTU, IB
                        multicast group PMTU? Does openSM pick up the
smallest
                        MTU in the subnet?


        Are you asking about link MTU, SA PathRecord/MultiPathRecord MTU,
SA
        MCMemberRecord MTU, or all of these ?

        -- Hal
                        Thanks
                        Shirley Ma









_______________________________________________
general mailing list
general at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general

--
MST
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070726/45461b70/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070726/45461b70/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pic14492.gif
Type: image/gif
Size: 1255 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070726/45461b70/attachment-0001.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070726/45461b70/attachment-0002.gif>


More information about the general mailing list