[ofw][IPOIB]bypass_check_bcast_rate usage extension

Alex Estrin alex.estrin at qlogic.com
Mon Apr 23 15:47:48 PDT 2007


Please see below.

Thanks,
Alex

> -----Original Message-----
> From: Fab Tillier [mailto:ftillier at windows.microsoft.com]
> Sent: Monday, April 23, 2007 5:57 PM
> To: Alex Estrin; Yossi Leybovich; ofw at lists.openfabrics.org
> Subject: RE: [ofw][IPOIB]bypass_check_bcast_rate usage extension
> 
> The problem is managing the settings.  Unless there's a quick and easy
> way of setting these entries via a script this is going to be a
> management nightmare.
Agree, although
could be disabled by setting default bypass_bcast_check_rate flag.

> The existing MC join checks are based on a MC group that someone
*else*
> created.
> How would the fabric get shutdown if the a port creates a group for a
> very low rate?

Fabric with mixed platforms( Windows & Linux ).
QLogic(Silverstorm) stack ipoib has a minimum rate it can accept to be
able to join the group.
  Connectivity should work just fine.  The code only
> aborts joining the broadcast group if the local port rate is lower
than
> the broadcast group's rate.
> Maybe I'm missing something.
Even if connectivity looks Ok, it just masks or creates a chocking point
on the fabric that is used by apps generating intensive multicast data
traffic.

> -Fab
> 
> -----Original Message-----
> From: ofw-bounces at lists.openfabrics.org
> [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Alex Estrin
> Sent: Monday, April 23, 2007 1:56 PM
> To: Fab Tillier; Yossi Leybovich; ofw at lists.openfabrics.org
> Subject: RE: [ofw][IPOIB]bypass_check_bcast_rate usage extension
> 
> Hi Fab,
> 
> You are correct if active SM does have control over group rate limits.
> If it doesn't then we have real possibility to shutdown whole fabric
> because of one crippled port.
> Proposed patch just completes already enabled rate check for join MC
> group,
> by adding similar check for creating group.
> 
> Thanks,
> Alex
> 
> > -----Original Message-----
> > From: Fab Tillier [mailto:ftillier at windows.microsoft.com]
> > Sent: Monday, April 23, 2007 4:17 PM
> > To: Alex Estrin; Yossi Leybovich; ofw at lists.openfabrics.org
> > Subject: RE: [ofw][IPOIB]bypass_check_bcast_rate usage extension
> >
> > Shouldn't minimum MC group rates be controlled at the SM, rather
than
> at
> > every system on the fabric?  It seems to me that having a per-port
> > variable is a recipe for configuration errors.
> >
> > The SM is a single-point where policy like minimum rate and default
> rate
> > should be managed.
> >
> > Just my $.02
> > -Fab
> >
> > -----Original Message-----
> > From: ofw-bounces at lists.openfabrics.org
> > [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Alex Estrin
> > Sent: Monday, April 23, 2007 1:08 PM
> > To: Yossi Leybovich; ofw at lists.openfabrics.org
> > Subject: [ofw][IPOIB]bypass_check_bcast_rate usage extension
> >
> > Hi Yossi,
> >
> > Proposed patch is intended to complete 'check bcast rate behavior'
of
> > joining existing multicast group (fail to join if offered rate is
too
> > low), by failing to creating such group with rate too low for other
> > nodes to be able to join.
> > Please review.
> >
> > Thanks,
> > Alex
> >
> >
> > Index: kernel/ipoib_driver.c
> > ===================================================================
> > --- kernel/ipoib_driver.c	(revision 630)
> > +++ kernel/ipoib_driver.c	(working copy)
> > @@ -316,7 +316,7 @@
> >  {
> >  	NTSTATUS						status;
> >  	/* Remember the terminating entry in the table below. */
> > -	RTL_QUERY_REGISTRY_TABLE		table[4];
> > +	RTL_QUERY_REGISTRY_TABLE		table[5];
> >  	UNICODE_STRING					param_path;
> >
> >  	IPOIB_ENTER( IPOIB_DBG_INIT );
> > @@ -362,6 +362,13 @@
> >  	table[2].DefaultData = &g_ipoib.bypass_check_bcast_rate;
> >  	table[2].DefaultLength = sizeof(ULONG);
> >
> > +	table[3].Flags = RTL_QUERY_REGISTRY_DIRECT;
> > +	table[3].Name = L"min_group_rate";
> > +	table[3].EntryContext = &g_ipoib.min_group_rate;
> > +	table[3].DefaultType = REG_DWORD;
> > +	table[3].DefaultData = &g_ipoib.min_group_rate;
> > +	table[3].DefaultLength = sizeof(ULONG);
> > +
> >  	/* Have at it! */
> >  	status = RtlQueryRegistryValues( RTL_REGISTRY_ABSOLUTE,
> >  		param_path.Buffer, table, NULL, NULL );
> > Index: kernel/ipoib_driver.h
> > ===================================================================
> > --- kernel/ipoib_driver.h	(revision 630)
> > +++ kernel/ipoib_driver.h	(working copy)
> > @@ -77,7 +77,7 @@
> >  	NDIS_HANDLE		h_ibat_dev;
> >  	volatile LONG	ibat_ref;
> >  	uint32_t		bypass_check_bcast_rate;
> > -
> > +	uint32_t		min_group_rate;
> >  }	ipoib_globals_t;
> >  /*
> >  * FIELDS
> > @@ -95,6 +95,9 @@
> >  *
> >  *	h_ibat_dev
> >  *		Device handle returned by NdisMRegisterDevice.
> > +*
> > +*	min_group_rate
> > +*		minimum port rate allowed to create bcast/mcast group.
> > Gbps.
> >  *********/
> >
> >  extern ipoib_globals_t	g_ipoib;
> > Index: kernel/ipoib_log.mc
> > ===================================================================
> > --- kernel/ipoib_log.mc	(revision 630)
> > +++ kernel/ipoib_log.mc	(working copy)
> > @@ -281,5 +281,5 @@
> >  Severity=Error
> >  SymbolicName=EVENT_IPOIB_BCAST_RATE
> >  Language=English
> > -%2: The local port rate is too slow for the existing broadcast MC
> > group.
> > +%2: The local port rate is too low to join or create broadcast MC
> > group.
> >  .
> > Index: kernel/ipoib_port.c
> > ===================================================================
> > --- kernel/ipoib_port.c	(revision 630)
> > +++ kernel/ipoib_port.c	(working copy)
> > @@ -94,7 +94,14 @@
> >  __port_free(
> >  	IN				cl_obj_t* const
> > p_obj );
> >
> > +static inline uint8_t
> > +__port_rate_to_Gbps(
> > +	IN				uint8_t
> > rate );
> >
> > +static inline uint8_t
> > +__port_rate_from_Gbps(
> > +	IN				uint8_t
> > rate );
> > +
> >
> >
>
/***********************************************************************
> > *******
> >  *
> >  * IB resource manager operations
> > @@ -1775,7 +1782,6 @@
> >  			status = __endpt_mgr_insert( p_port, mac,
> > *pp_src );
> >  			if( status != IB_SUCCESS )
> >  			{
> > -				cl_obj_unlock( &p_port->obj );
> >  				IPOIB_PRINT_EXIT( TRACE_LEVEL_ERROR,
> > IPOIB_DBG_ERROR,
> >  					("__endpt_mgr_insert returned
> > %s\n",
> >
> > p_port->p_adapter->p_ifc->get_err_str( status )) ); @@ -5069,6
+5075,8
> > @@  {
> >  	ib_api_status_t		status;
> >  	ib_mcast_req_t		mcast_req;
> > +	uint32_t			mcast_rate_gbps;
> > +	uint32_t			port_rate_gbps;
> >
> >  	IPOIB_ENTER( IPOIB_DBG_INIT );
> >
> > @@ -5106,6 +5114,34 @@
> >  	mcast_req.port_guid = p_port->p_adapter->guids.port_guid;
> >  	mcast_req.pkey_index = 0;
> >
> > +	/* prevent mcast group creating if local port rate
> > +	 is too low for other nodes to be able to join */
> > +	if( !g_ipoib.bypass_check_bcast_rate )
> > +	{
> > +		port_rate_gbps = __port_rate_to_Gbps(
> > p_port->ib_mgr.rate );
> > +
> > +		/*if service parameter min_group_rate is not specified
> > +		  or 0, or invalid then MIN_DEFAULT_RATE will be
> > selected */
> > +		mcast_rate_gbps = __port_rate_to_Gbps(
> > +			__port_rate_from_Gbps(
> > (uint8_t)g_ipoib.min_group_rate ) );
> > +
> > +		if(	mcast_rate_gbps > port_rate_gbps )
> > +		{
> > +			IPOIB_PRINT( TRACE_LEVEL_WARNING,
> > IPOIB_DBG_INIT,
> > +				("Port rate is too low to create Bcast
> > group.\n") );
> > +
> > +			NdisWriteErrorLogEntry(
> > p_port->p_adapter->h_adapter,
> > +				EVENT_IPOIB_BCAST_RATE, 2,
> > +				(uint32_t)p_port->ib_mgr.rate,
> > +				(uint32_t)__port_rate_from_Gbps(
> > (uint8_t)g_ipoib.min_group_rate ) );
> > +
> > +			return IB_ERROR;
> > +		}
> > +		mcast_req.member_rec.rate =
> > +			( __port_rate_from_Gbps(
> > (uint8_t)g_ipoib.min_group_rate ) );
> > +		mcast_req.member_rec.rate |= ( IB_PATH_SELECTOR_EXACTLY
> > << 6 );
> > +	}
> > +
> >  	/* reference the object for the multicast join request. */
> >  	ipoib_port_ref( p_port, ref_join_bcast );
> >
> > @@ -5656,4 +5692,61 @@
> >  	IPOIB_EXIT( IPOIB_DBG_MCAST );
> >  }
> >
> > +static inline uint8_t
> > +__port_rate_to_Gbps(
> > +	IN			uint8_t			rate )
> > +{
> > +	switch ( (int)rate )
> > +	{
> > +	case IB_PATH_RECORD_RATE_2_5_GBS:
> > +		return 2;
> > +	case IB_PATH_RECORD_RATE_5_GBS:
> > +		return 5;
> > +	case IB_PATH_RECORD_RATE_10_GBS:
> > +		return 10;
> > +	case IB_PATH_RECORD_RATE_20_GBS:
> > +		return 20;
> > +	case IB_PATH_RECORD_RATE_30_GBS:
> > +		return 30;
> > +	case IB_PATH_RECORD_RATE_40_GBS:
> > +		return 40;
> > +	case IB_PATH_RECORD_RATE_60_GBS:
> > +		return 60;
> > +	case IB_PATH_RECORD_RATE_80_GBS:
> > +		return 80;
> > +	case IB_PATH_RECORD_RATE_120_GBS:
> > +		return 120;
> > +	default:
> > +		return 0;
> > +	}
> > +}
> >
> > +static inline uint8_t
> > +__port_rate_from_Gbps(
> > +	IN			uint8_t			rate_gbps )
> > +{
> > +	switch ( (int)rate_gbps )
> > +	{
> > +	case 2:
> > +		return (uint8_t)IB_PATH_RECORD_RATE_2_5_GBS;
> > +	case 5:
> > +		return (uint8_t)IB_PATH_RECORD_RATE_5_GBS;
> > +	case 10:
> > +		return (uint8_t)IB_PATH_RECORD_RATE_10_GBS;
> > +	case 20:
> > +		return (uint8_t)IB_PATH_RECORD_RATE_20_GBS;
> > +	case 30:
> > +		return (uint8_t)IB_PATH_RECORD_RATE_30_GBS;
> > +	case 40:
> > +		return (uint8_t)IB_PATH_RECORD_RATE_40_GBS;
> > +	case 60:
> > +		return (uint8_t)IB_PATH_RECORD_RATE_60_GBS;
> > +	case 80:
> > +		return (uint8_t)IB_PATH_RECORD_RATE_80_GBS;
> > +	case 120:
> > +		return (uint8_t)IB_PATH_RECORD_RATE_120_GBS;
> > +	default :
> > +		return (uint8_t)MIN_DEFAULT_GROUP_RATE;
> > +	}
> > +}
> > +
> > Index: kernel/ipoib_port.h
> > ===================================================================
> > --- kernel/ipoib_port.h	(revision 630)
> > +++ kernel/ipoib_port.h	(working copy)
> > @@ -54,6 +54,8 @@
> >  /* Max send data segment list size. */
> >  #define MAX_SEND_SGE	8
> >
> > +/* Min port rate allowed to create mcast group */ #define
> > +MIN_DEFAULT_GROUP_RATE (IB_PATH_RECORD_RATE_10_GBS)
> >
> >  /*
> >   * Define to control how transfers are done.  When defined as 1,
> causes
> > Index: kernel/netipoib.inf
> > ===================================================================
> > --- kernel/netipoib.inf	(revision 630)
> > +++ kernel/netipoib.inf	(working copy)
> > @@ -138,7 +138,7 @@
> >  HKR,"Parameters","DebugLevel",%REG_DWORD_NO_CLOBBER%,0x00000002
> >  HKR,"Parameters","DebugFlags",%REG_DWORD_NO_CLOBBER%,0x00000fff
> >
> >
>
HKR,"Parameters","bypass_check_bcast_rate",%REG_DWORD_NO_CLOBBER%,0x0000
> > 0000
> > -
> > +HKR,"Parameters","min_group_rate",%REG_DWORD_NO_CLOBBER%,
> %RATE_10_GBS%
> >  [IpoibEventLog]
> >  AddReg = IpoibAddEventLogReg
> >
> > @@ -194,3 +194,10 @@
> >  DIRID_DRIVERS        = 12
> >  DIRID_SYSTEM_X86     = 16425
> >  REG_DWORD_NO_CLOBBER = 0x00010003
> > +RATE_10_GBS         = 10
> > +RATE_20_GBS         = 20
> > +RATE_30_GBS         = 30
> > +RATE_40_GBS         = 40
> > +RATE_60_GBS         = 60
> > +RATE_80_GBS         = 80
> > +RATE_120_GBS        = 120
> _______________________________________________
> ofw mailing list
> ofw at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw



More information about the ofw mailing list