[openib-general] [PATCH] osm: partition manager force policy

Thu Jun 15 11:15:24 PDT 2006

Hi Eitan,

On 15:19 Thu 15 Jun     , Eitan Zahavi wrote:
> >>+/*
> >>+* PARAMETERS
> >>+*  p_physp
> >>+*     [in] Pointer to an osm_physp_t object.
> >>+*
> >>+* RETURN VALUES
> >>+*  The pointer to the P_Key table object.
> >>+*
> >>+* NOTES
> >>+*
> >>+* SEE ALSO
> >>+*  Port, Physical Port
> >>+*********/
> >>+
> >
> >
> >Is not this simpler to remove 'const' from existing
> >osm_physp_get_pkey_tbl() function instead of using new one?
> There are plenty of const functions using this function internally
> so I would have need to fix them too.

You are right. Maybe separate patch for this?

> >>@@ -118,14 +121,29 @@ void osm_pkey_tbl_sync_new_blocks(
> >>    p_block = cl_ptr_vector_get(&p_pkey_tbl->blocks, b);
> >>    if ( b < new_blocks )
> >>      p_new_block = cl_ptr_vector_get(&p_pkey_tbl->new_blocks, b);
> >>-    else {
> >>+		else 
> >>+      {
> >>      p_new_block = (ib_pkey_table_t *)malloc(sizeof(*p_new_block));
> >>      if (!p_new_block)
> >>        break;
> >>+			cl_ptr_vector_set(&((osm_pkey_tbl_t 
> >>*)p_pkey_tbl)->new_blocks, +						 b, 
> >>p_new_block);
> >>+		}
> >>+
> >>      memset(p_new_block, 0, sizeof(*p_new_block));
> >>-      cl_ptr_vector_set(&((osm_pkey_tbl_t *)p_pkey_tbl)->new_blocks, b, 
> >>p_new_block);
> >>    }
> >>-    memcpy(p_new_block, p_block, sizeof(*p_new_block));
> >>+}
> >
> >
> >You changed this function so it does not do any sync anymore. Should
> >function name be changed too?
> Yes correct I will change it. Is a better name:
> osm_pkey_tbl_init_new_blocks ?

Great name.

> >>+  to show that on the "old" blocks
> >>+*/
> >>+int
> >>+osm_pkey_tbl_set_new_entry( 
> >>+	IN osm_pkey_tbl_t *p_pkey_tbl,
> >>+	IN uint16_t        block_idx,
> >>+	IN uint8_t         pkey_idx,
> >>+	IN uint16_t        pkey)
> >>+{  
> >>+	ib_pkey_table_t *p_old_block;
> >>+	ib_pkey_table_t *p_new_block;
> >>+	
> >>+	if (osm_pkey_tbl_make_block_pair(
> >>+			 p_pkey_tbl,  block_idx, &p_old_block, &p_new_block))
> >>+		return 1;
> >>+		
> >>+	cl_map_insert( &p_pkey_tbl->keys,
> >>+						ib_pkey_get_base(pkey),
> >>+					 
> >>&(p_old_block->pkey_entry[pkey_idx]));
> >
> >
> >Here you map potentially empty pkey entry. Why? "old block" will be
> >remapped anyway on pkey receiving.
> The reason I did this was that if the GetResp will fail I still want to 
> represent
> the settings in the map.But actually it might be better not to do that so 
> next
> time we run we will not find it without a GetResp.

Agree.

> >>+	IN	 uint16_t		 *p_pkey,
> >>+	OUT uint32_t		 *p_block_idx,
> >>+	OUT uint8_t			 *p_pkey_index)
> >>+{
> >>+	uint32_t			  num_of_blocks;
> >>+	uint32_t			  block_index;
> >>+	ib_pkey_table_t *block;
> >>+
> >>+	CL_ASSERT( p_pkey_tbl );
> >>+	CL_ASSERT( p_block_idx != NULL );
> >>+	CL_ASSERT( p_pkey_idx != NULL );
> >
> >
> >Why last two CL_ASSERTs? What should be problem with uninitialized
> >pointers here?
> >
> These are the outputs of the function. It does not make sense to call the 
> functions with
> null output pointers (calling by ref) . Anyway instead of putting the check 
> in the free build
> I used an assert

I see. Actually I've overlooked that addresses and not values are
checked. Please ignore this comment.

> >>+
> >>+	p_pkey_tbl = osm_physp_get_mod_pkey_tbl( p_physp );
> >>+	if (! p_pkey_tbl)
> >
> >           ^^^^^^^^^^^^^
> >Is it possible?
> Yes it is ! I run into it during testing. The port did not have any pkey 
> table.

static inline osm_pkey_tbl_t *
osm_physp_get_mod_pkey_tbl( IN osm_physp_t* const p_physp )
{
...
  return( &p_physp->pkeys );
};

This returns the address of physp's pkeys field. Right?
Then if ( &p_physp->pkeys == NULL ) p_physp pointer should be equal to
unsigned equivalent of -(offset of pkey field in physp struct).

> >>+					"Fail to allocate new pending pkey 
> >>entry for node "
> >>+					"0x%016" PRIx64 " port %u\n",
> >>+					cl_ntoh64( osm_node_get_node_guid( 
> >>p_node ) ),
> >>+					osm_physp_get_port_num( p_physp ) );
> >>+		return;
> >>+	}
> >>+	p_pending->pkey = pkey;
> >>+	p_orig_pkey = cl_map_get( &p_pkey_tbl->keys, ib_pkey_get_base( pkey 
> >>) );
> >>+	if ( !p_orig_pkey  || 
> >>+		  (ib_pkey_get_base(*p_orig_pkey) != ib_pkey_get_base(pkey) 
> >>))
> >
> >
> >There the cases of new pkey and updated pkey membership is mixed. Why?
> I am not following your question.
> The specific case I am trying to catch is the one that for some reason the 
> map points to
> a pkey entry that was modified somehow and is different then the one you 
> would expect by
> the map.

Didn't understand it at first pass, now it is clearer.

If pkey entry was modified somehow (how? bugs?), the assumption is that
mapping still be valid? Then it is not new entry (or we will change
pkey's index in the real table).

> >>+	{
> >>+		p_pending->is_new = TRUE;
> >>+		cl_qlist_insert_tail(&p_pkey_tbl->pending, 
> >>(cl_list_item_t*)p_pending);
> >>+		stat = "inserted";
> >>+	}
> >>+	else
> >>+	{
> >>+		p_pending->is_new = FALSE;
> >>+		if (osm_pkey_tbl_get_block_and_idx(p_pkey_tbl, p_orig_pkey,
> >>+									 
> >>&p_pending->block, &p_pending->index))
> >
> >                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >AFAIK in this function there were CL_ASSERTs which check for uinitialized
> >pointers.
> True. So the asserts are not required in this case.

Up to you. Actually this my comment may be ignored, as stated above I
didn't read this correctly.

> >
> >
> >>+		{
> >>+			osm_log( p_log, OSM_LOG_ERROR,
> >>+					 "pkey_mgr_process_physical_port: 
> >>ERR 0503: "
> >>+						"Fail to obtain P_Key 0x%04x 
> >>block and index for node "
> >>+						"0x%016" PRIx64 " port %u\n",
> >>+						cl_ntoh64( 
> >>osm_node_get_node_guid( p_node ) ),
> >>+						osm_physp_get_port_num( 
> >>p_physp ) );
> >>+			return;
> >>+		}
> >>+		cl_qlist_insert_head(&p_pkey_tbl->pending, 
> >>(cl_list_item_t*)p_pending);
> >>+		stat = "updated";
> >
> >
> >Is it will be updated? It is likely "already there" case. No?
> >
> >Also in this case you can already put the pkey in new_block instead of
> >holding it in pending list. Then later you will only need to add new
> >pkeys. This may simplify the flow and even save some mem.
> True but in my mind it does not simplify - on the contrary it makes the 
> partition between
> populating each port pending list and actually setting the pkey tables 
> mixed.

I meant new_block filling, not actual setting. You will be able to
remove whole if { } else { } flow, as well as is_new, block and index
fields from 'pending' structure (actually only pkey value itself will
matter) - is it not nice simplification?

> I do not think the memory impact deserves this mix of staging
> 
> >
> >

> >>+	max_num_of_blocks = pkey_mgr_get_physp_max_blocks( p_req->p_subn, 
> >>p_physp );
> >>+	if (	p_pkey_tbl->max_blocks > max_num_of_blocks )
> >>      {
> >>-         block = osm_pkey_tbl_new_block_get( p_pkey_tbl, block_index );
> >>-         for ( i = 0; i < IB_NUM_PKEY_ELEMENTS_IN_BLOCK; i++ )
> >>+		osm_log( p_log, OSM_LOG_INFO,
> >>+					"pkey_mgr_update_port: "
> >>+					"Max number of blocks reduced from 
> >>%u to %u " +					"for node 0x%016" PRIx64 " 
> >>port %u\n",
> >>+					p_pkey_tbl->max_blocks, 
> >>max_num_of_blocks,
> >>+					cl_ntoh64( osm_node_get_node_guid( 
> >>p_node ) ),
> >>+					osm_physp_get_port_num( p_physp ) ); 
> >>+	}
> >>+	p_pkey_tbl->max_blocks = max_num_of_blocks;
> >>+
> >>+	osm_pkey_tbl_sync_new_blocks( p_pkey_tbl );
> >>+	cl_map_remove_all( &p_pkey_tbl->keys );
> >
> >
> >What is the reason to drop map here? AFAIK it will be reinitialized later
> >anyway when pkey blocks will be received.
> What if it is not received?

Then we will have unreliable data there.

Maybe I know why you wanted this - this is part of "use pkey tables
before sending/receiving to/from ports" idea?

> >>@@ -255,24 +443,36 @@ pkey_mgr_update_peer_port(
> >>   if (enforce == FALSE)
> >>      return FALSE;
> >>
> >>-   p_pkey_tbl = osm_physp_get_pkey_tbl( p );
> >>-   p_peer_pkey_tbl = osm_physp_get_pkey_tbl( peer );
> >>+	p_pkey_tbl = osm_physp_get_pkey_tbl( p_physp );
> >>+	p_peer_pkey_tbl = osm_physp_get_mod_pkey_tbl( peer );
> >>   num_of_blocks = osm_pkey_tbl_get_num_blocks( p_pkey_tbl );
> >>-   if ( num_of_blocks > osm_pkey_tbl_get_num_blocks( p_peer_pkey_tbl ) )
> >>-      num_of_blocks = osm_pkey_tbl_get_num_blocks( p_peer_pkey_tbl );
> >>+	peer_max_blocks = pkey_mgr_get_physp_max_blocks( p_subn, peer );
> >>+	if (peer_max_blocks < p_pkey_tbl->used_blocks)
> >>+	{
> >>+		osm_log( p_log, OSM_LOG_ERROR,
> >>+					"pkey_mgr_update_peer_port: ERR 
> >>0508: "
> >>+					"not enough entries (%u < %u) on 
> >>switch 0x%016" PRIx64
> >>+					" port %u\n",
> >>+					peer_max_blocks, num_of_blocks,
> >>+					cl_ntoh64( osm_node_get_node_guid( 
> >>p_node ) ),
> >>+					osm_physp_get_port_num( peer ) );
> >>+		return FALSE;
> >
> >
> >Do you think it is the best way, just to skip update - partitions are
> >enforced already on the switch. May be better to truncate pkey tables
> >in order to meet peer's capabilities?
> You are right about that - Its a bug!
> I think the best approach here is to turn off the enforcement on the switch.
> If we truncate the table we actually impact connectivity of the fabric.
> I prefer a softer approach - an error in the log.

Yes this should be good way to handle this.

> >
> >
> >>+	}
> >>
> >>-   for ( block_index = 0; block_index < num_of_blocks; block_index++ )
> >>+	p_peer_pkey_tbl->used_blocks = p_pkey_tbl->used_blocks;
> >>+	for ( block_index = 0; block_index < p_pkey_tbl->used_blocks; 
> >>block_index++)
> >>   {
> >>      block = osm_pkey_tbl_new_block_get( p_pkey_tbl, block_index );
> >>      peer_block = osm_pkey_tbl_block_get( p_peer_pkey_tbl, block_index );
> >>      if ( memcmp( peer_block, block, sizeof( *peer_block ) ) )
> >>      {
> >>+			osm_pkey_tbl_set(p_peer_pkey_tbl, block_index, 
> >>block);
> >
> >
> >Why this (osm_pkey_tbl_set())? This will be called by receiver.
> Same as the above note about updating the map
> I wanted to avoid to wait for the GetResp.
> I think it is a mistake and we can actually remove it.

Agree.

Sasha.