[ofa-general] Re: openSM: Different IB MTUs

Sasha Khapyorsky sashak at voltaire.com
Thu Jul 26 15:47:10 PDT 2007


On 09:00 Thu 26 Jul     , Eitan Zahavi wrote:
> I propose that when there is no MTU in the partition policy file OpenSM
> use a 
> configurable default from: /etc/cache/opensm/opensm.opt.
> Something like:
> # The default MTU to be used for IPoIB and other MCGs when the
> partition-policy 
> # does not provide exact value. The default is the lowest possible MTU
> mcg_default_mtu 1

Looks like good solution for me.

Somebody cares about patch?

Sasha

>  
> Eitan Zahavi 
> Senior Engineering Director, Software Architect 
> Mellanox Technologies LTD 
> Tel:+972-4-9097208
> Fax:+972-4-9593245 
> P.O. Box 586 Yokneam 20692 ISRAEL 
>  
> 
> 
> ________________________________
> 
> 	From: Shirley Ma [mailto:xma at us.ibm.com] 
> 	Sent: Wednesday, July 25, 2007 10:45 PM
> 	To: Eitan Zahavi
> 	Cc: general at lists.openfabrics.org; Hal Rosenstock
> 	Subject: RE: [ofa-general] Re: openSM: Different IB MTUs
> 	
> 	
> 
> 	Hello Eitan, Hal,
> 	
> 	Thanks. It's good openSM has the configuration option to set up
> these attributes in MC. Is this a good idea to add below to openSM: When
> there is no MTU defined in the configuration file, SM can pick up the
> smallest link MTU in the fabrics by default? MTU is unlikely rate,
> slower rate might indicate the cablling problem. So using the smallest
> link MTU in the fabrics might not be a bad choice for MC by default. The
> reason I request here is to create IP multicast group, MTU is not an
> attribute of the group. When mapping IP multicast to IB multicast, IB
> muliticast might fail because of different IB link MTU size in the
> group, but IP multicast group will be successful without knowing the
> failure. If admin sets MTU in configuration file, admin would know this
> failure. Otherwise, admin/users could spend too much time on debugging
> their broken multicasting applications.
> 	
> 	Thanks
> 	Shirley Ma
> 	
> 	 "Eitan Zahavi" <eitan at mellanox.co.il>
> 	
> 	
> 	
> 
> 				"Eitan Zahavi" <eitan at mellanox.co.il> 
> 
> 				07/25/07 12:25 PM
> 
>  
> 
> To
> 
> "Hal Rosenstock" <hal.rosenstock at gmail.com>, Shirley
> Ma/Beaverton/IBM at IBMUS	
> 
> 
> cc
> 
> <general at lists.openfabrics.org>	
> 
> 
> Subject
> 
> RE: [ofa-general] Re: openSM: Different IB MTUs	
> 	 	
> 
> 	Hi Shirley,
> 	
> 	I think I understand where your question comes from...
> 	Many have issue with heterogonous fabrics where not all nodes
> have same MTU or Speed.
> 	Especially when IPoIB relies on all nodes joining the broadcast
> group.
> 	
> 	The term "join" for multicast groups is a little overloaded.
> 	If a node joins an existing MC group it has to have a rate
> (speed * width) > MCG.rate and support MTU > MCG.MTU otherwise it is
> denied.
> 	If the join is actually a "create" the node has to provide the
> rate and MTU which define the MCG values.
> 	
> 	To allow for administrator to control the IPoIB MCGs MTU and
> rate OpenSM provides the means to control these
> 	values per partition. See the doc/partition-config.doc 
> 	Still the administrator should know what would be the lowest MTU
> and rate the nodes expected to join the IPoIB subnet have.
> 	The tradeoff is in the hands of the administrator who can set a
> value that will prevent slow nodes from joining the group, 
> 	or assign a low value that will fit all nodes but slow down
> communication ...
> 	
> 	EZ 
> 
> 	Eitan Zahavi 
> 	Senior Engineering Director, Software Architect 
> 	Mellanox Technologies LTD 
> 	Tel:+972-4-9097208
> 	Fax:+972-4-9593245 
> 	P.O. Box 586 Yokneam 20692 ISRAEL 
> 
> 	
> 	
> 	
> ________________________________
> 
> 	From: general-bounces at lists.openfabrics.org [
> mailto:general-bounces at lists.openfabrics.org] On Behalf Of Hal
> Rosenstock
> 	Sent: Wednesday, July 25, 2007 10:01 PM
> 	To: Shirley Ma
> 	Cc: general at lists.openfabrics.org
> 	Subject: [ofa-general] Re: openSM: Different IB MTUs
> 	
> 	Shirley,
> 	
> 	On 7/25/07, Shirley Ma <xma at us.ibm.com <mailto:xma at us.ibm.com> >
> wrote: 
> 
> 		Hal,
> 		
> 		Thanks for your prompt reply. I am asking for how openSM
> handle different link MTUs in SA MCMemberRecord MTU. For example, if we
> have some links MTU as 2K, some links MTU as 1K. Then when enabling
> IPoIB, how does SM decide IPoIB broadcast group MCMemberRecord MTU size?
> When creating an IB multicast group from a 2K MTU node first, which PMTU
> value is attaching to this IB multicast group MCMemberRecord MTU? 
> 
> 
> 	
> 	MCMemberRecord MTU gets the group MTU (when created). This is
> either this first joiner with sufficient components or preconfigured
> (and MTU can be set in the config). If a joiner has insufficient MTU for
> the group, it is denied. 
> 	
> 	-- Hal
> 	
> 	
> 
> 		Thanks
> 		Shirley Ma
> 		
> 		 "Hal Rosenstock" < hal.rosenstock at gmail.com
> <mailto:hal.rosenstock at gmail.com> >
> 		
> 		
> 	
> 
> 					"Hal Rosenstock" <
> hal.rosenstock at gmail.com <mailto:hal.rosenstock at gmail.com> > 
> 
> 					07/25/07 10:57 AM
> 
> 	
> 	  
> 	To
> 	
> Shirley Ma/Beaverton/IBM at IBMUS	
> 	 
> 	cc
> 	
> general at lists.openfabrics.org <mailto:general at lists.openfabrics.org> 	
> 	 
> 	Subject
> 	
> Re: openSM: Different IB MTUs	
> 		 	
> 		
> 		Shirley,
> 		
> 		On 7/25/07, Shirley Ma < xma at us.ibm.com
> <mailto:xma at us.ibm.com> > wrote: 
> 
> 				Hello Hal,
> 				
> 				How does openSM handle CAs with
> different MTUs in the same subnet? For example, IPoIB broadcast group
> MTU, IB multicast group PMTU? Does openSM pick up the smallest MTU in
> the subnet? 
> 
> 		
> 		
> 		Are you asking about link MTU, SA
> PathRecord/MultiPathRecord MTU, SA MCMemberRecord MTU, or all of these ?
> 		
> 		-- Hal 
> 
> 				Thanks
> 				Shirley Ma
> 
> 
> 






> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



More information about the general mailing list