[openib-general] [PATCH] osm: partition manager force policy

Sat Jun 17 12:36:40 PDT 2006

Sasha Khapyorsky wrote:

I'm working on the changes below. I will send them all as one patch

EZ
> Hi Eitan,
> 
> On 15:19 Thu 15 Jun     , Eitan Zahavi wrote:
> 
>>>>+/*
>>>>+* PARAMETERS
>>>>+*  p_physp
>>>>+*     [in] Pointer to an osm_physp_t object.
>>>>+*
>>>>+* RETURN VALUES
>>>>+*  The pointer to the P_Key table object.
>>>>+*
>>>>+* NOTES
>>>>+*
>>>>+* SEE ALSO
>>>>+*  Port, Physical Port
>>>>+*********/
>>>>+
>>>
>>>
>>>Is not this simpler to remove 'const' from existing
>>>osm_physp_get_pkey_tbl() function instead of using new one?
>>
>>There are plenty of const functions using this function internally
>>so I would have need to fix them too.
> 
> 
> You are right. Maybe separate patch for this?
> 
I think it is preferable to keep the const function.

> 
>>>>@@ -118,14 +121,29 @@ void osm_pkey_tbl_sync_new_blocks(
>>>>   p_block = cl_ptr_vector_get(&p_pkey_tbl->blocks, b);
>>>>   if ( b < new_blocks )
>>>>     p_new_block = cl_ptr_vector_get(&p_pkey_tbl->new_blocks, b);
>>>>-    else {
>>>>+		else 
>>>>+      {
>>>>     p_new_block = (ib_pkey_table_t *)malloc(sizeof(*p_new_block));
>>>>     if (!p_new_block)
>>>>       break;
>>>>+			cl_ptr_vector_set(&((osm_pkey_tbl_t 
>>>>*)p_pkey_tbl)->new_blocks, +						 b, 
>>>>p_new_block);
>>>>+		}
>>>>+
>>>>     memset(p_new_block, 0, sizeof(*p_new_block));
>>>>-      cl_ptr_vector_set(&((osm_pkey_tbl_t *)p_pkey_tbl)->new_blocks, b, 
>>>>p_new_block);
>>>>   }
>>>>-    memcpy(p_new_block, p_block, sizeof(*p_new_block));
>>>>+}
>>>
>>>
>>>You changed this function so it does not do any sync anymore. Should
>>>function name be changed too?
>>
>>Yes correct I will change it. Is a better name:
>>osm_pkey_tbl_init_new_blocks ?
> 
> 
> Great name.
> 
> 
>>>>+  to show that on the "old" blocks
>>>>+*/
>>>>+int
>>>>+osm_pkey_tbl_set_new_entry( 
>>>>+	IN osm_pkey_tbl_t *p_pkey_tbl,
>>>>+	IN uint16_t        block_idx,
>>>>+	IN uint8_t         pkey_idx,
>>>>+	IN uint16_t        pkey)
>>>>+{  
>>>>+	ib_pkey_table_t *p_old_block;
>>>>+	ib_pkey_table_t *p_new_block;
>>>>+	
>>>>+	if (osm_pkey_tbl_make_block_pair(
>>>>+			 p_pkey_tbl,  block_idx, &p_old_block, &p_new_block))
>>>>+		return 1;
>>>>+		
>>>>+	cl_map_insert( &p_pkey_tbl->keys,
>>>>+						ib_pkey_get_base(pkey),
>>>>+					 
>>>>&(p_old_block->pkey_entry[pkey_idx]));
>>>
>>>
>>>Here you map potentially empty pkey entry. Why? "old block" will be
>>>remapped anyway on pkey receiving.
>>
>>The reason I did this was that if the GetResp will fail I still want to 
>>represent
>>the settings in the map.But actually it might be better not to do that so 
>>next
>>time we run we will not find it without a GetResp.
> 
> 
> Agree.
> 
> 
>>>>+	IN	 uint16_t		 *p_pkey,
>>>>+	OUT uint32_t		 *p_block_idx,
>>>>+	OUT uint8_t			 *p_pkey_index)
>>>>+{
>>>>+	uint32_t			  num_of_blocks;
>>>>+	uint32_t			  block_index;
>>>>+	ib_pkey_table_t *block;
>>>>+
>>>>+	CL_ASSERT( p_pkey_tbl );
>>>>+	CL_ASSERT( p_block_idx != NULL );
>>>>+	CL_ASSERT( p_pkey_idx != NULL );
>>>
>>>
>>>Why last two CL_ASSERTs? What should be problem with uninitialized
>>>pointers here?
>>>
>>
>>These are the outputs of the function. It does not make sense to call the 
>>functions with
>>null output pointers (calling by ref) . Anyway instead of putting the check 
>>in the free build
>>I used an assert
> 
> 
> I see. Actually I've overlooked that addresses and not values are
> checked. Please ignore this comment.
> 
> 
>>>>+
>>>>+	p_pkey_tbl = osm_physp_get_mod_pkey_tbl( p_physp );
>>>>+	if (! p_pkey_tbl)
>>>
>>>          ^^^^^^^^^^^^^
>>>Is it possible?
>>
>>Yes it is ! I run into it during testing. The port did not have any pkey 
>>table.
> 
> 
> static inline osm_pkey_tbl_t *
> osm_physp_get_mod_pkey_tbl( IN osm_physp_t* const p_physp )
> {
> ...
>   return( &p_physp->pkeys );
> };
> 
> This returns the address of physp's pkeys field. Right?
> Then if ( &p_physp->pkeys == NULL ) p_physp pointer should be equal to
> unsigned equivalent of -(offset of pkey field in physp struct).

Correct. I will remove the check.
> 
> 
>>>>+					"Fail to allocate new pending pkey 
>>>>entry for node "
>>>>+					"0x%016" PRIx64 " port %u\n",
>>>>+					cl_ntoh64( osm_node_get_node_guid( 
>>>>p_node ) ),
>>>>+					osm_physp_get_port_num( p_physp ) );
>>>>+		return;
>>>>+	}
>>>>+	p_pending->pkey = pkey;
>>>>+	p_orig_pkey = cl_map_get( &p_pkey_tbl->keys, ib_pkey_get_base( pkey 
>>>>) );
>>>>+	if ( !p_orig_pkey  || 
>>>>+		  (ib_pkey_get_base(*p_orig_pkey) != ib_pkey_get_base(pkey) 
>>>>))
>>>
>>>
>>>There the cases of new pkey and updated pkey membership is mixed. Why?
>>
>>I am not following your question.
>>The specific case I am trying to catch is the one that for some reason the 
>>map points to
>>a pkey entry that was modified somehow and is different then the one you 
>>would expect by
>>the map.
> 
> 
> Didn't understand it at first pass, now it is clearer.
> 
> If pkey entry was modified somehow (how? bugs?), the assumption is that
> mapping still be valid? Then it is not new entry (or we will change
> pkey's index in the real table).
> 
PKey table mismatch between the block and map should never happen.
I will remove the check and replace that with an ASSERT so I catch the bug if we hit it.

> 
>>>>+	{
>>>>+		p_pending->is_new = TRUE;
>>>>+		cl_qlist_insert_tail(&p_pkey_tbl->pending, 
>>>>(cl_list_item_t*)p_pending);
>>>>+		stat = "inserted";
>>>>+	}
>>>>+	else
>>>>+	{
>>>>+		p_pending->is_new = FALSE;
>>>>+		if (osm_pkey_tbl_get_block_and_idx(p_pkey_tbl, p_orig_pkey,
>>>>+									 
>>>>&p_pending->block, &p_pending->index))
>>>
>>>                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>AFAIK in this function there were CL_ASSERTs which check for uinitialized
>>>pointers.
>>
>>True. So the asserts are not required in this case.
> 
> 
> Up to you. Actually this my comment may be ignored, as stated above I
> didn't read this correctly.
> 
> 
>>>
>>>>+		{
>>>>+			osm_log( p_log, OSM_LOG_ERROR,
>>>>+					 "pkey_mgr_process_physical_port: 
>>>>ERR 0503: "
>>>>+						"Fail to obtain P_Key 0x%04x 
>>>>block and index for node "
>>>>+						"0x%016" PRIx64 " port %u\n",
>>>>+						cl_ntoh64( 
>>>>osm_node_get_node_guid( p_node ) ),
>>>>+						osm_physp_get_port_num( 
>>>>p_physp ) );
>>>>+			return;
>>>>+		}
>>>>+		cl_qlist_insert_head(&p_pkey_tbl->pending, 
>>>>(cl_list_item_t*)p_pending);
>>>>+		stat = "updated";
>>>
>>>
>>>Is it will be updated? It is likely "already there" case. No?
>>>
>>>Also in this case you can already put the pkey in new_block instead of
>>>holding it in pending list. Then later you will only need to add new
>>>pkeys. This may simplify the flow and even save some mem.
>>
>>True but in my mind it does not simplify - on the contrary it makes the 
>>partition between
>>populating each port pending list and actually setting the pkey tables 
>>mixed.
> 
> 
> I meant new_block filling, not actual setting. You will be able to
> remove whole if { } else { } flow, as well as is_new, block and index
> fields from 'pending' structure (actually only pkey value itself will
> matter) - is it not nice simplification?
I still prefer the clear staging: append to list when scanning the partitions and
filling in the tables when looping on all ports.
> 
> 
>>I do not think the memory impact deserves this mix of staging
>>
>>
>>>
> 
>>>>+	max_num_of_blocks = pkey_mgr_get_physp_max_blocks( p_req->p_subn, 
>>>>p_physp );
>>>>+	if (	p_pkey_tbl->max_blocks > max_num_of_blocks )
>>>>     {
>>>>-         block = osm_pkey_tbl_new_block_get( p_pkey_tbl, block_index );
>>>>-         for ( i = 0; i < IB_NUM_PKEY_ELEMENTS_IN_BLOCK; i++ )
>>>>+		osm_log( p_log, OSM_LOG_INFO,
>>>>+					"pkey_mgr_update_port: "
>>>>+					"Max number of blocks reduced from 
>>>>%u to %u " +					"for node 0x%016" PRIx64 " 
>>>>port %u\n",
>>>>+					p_pkey_tbl->max_blocks, 
>>>>max_num_of_blocks,
>>>>+					cl_ntoh64( osm_node_get_node_guid( 
>>>>p_node ) ),
>>>>+					osm_physp_get_port_num( p_physp ) ); 
>>>>+	}
>>>>+	p_pkey_tbl->max_blocks = max_num_of_blocks;
>>>>+
>>>>+	osm_pkey_tbl_sync_new_blocks( p_pkey_tbl );
>>>>+	cl_map_remove_all( &p_pkey_tbl->keys );
>>>
>>>
>>>What is the reason to drop map here? AFAIK it will be reinitialized later
>>>anyway when pkey blocks will be received.
>>
>>What if it is not received?
> 
> 
> Then we will have unreliable data there.
> 
> Maybe I know why you wanted this - this is part of "use pkey tables
> before sending/receiving to/from ports" idea?
> 
> 
>>>>@@ -255,24 +443,36 @@ pkey_mgr_update_peer_port(
>>>>  if (enforce == FALSE)
>>>>     return FALSE;
>>>>
>>>>-   p_pkey_tbl = osm_physp_get_pkey_tbl( p );
>>>>-   p_peer_pkey_tbl = osm_physp_get_pkey_tbl( peer );
>>>>+	p_pkey_tbl = osm_physp_get_pkey_tbl( p_physp );
>>>>+	p_peer_pkey_tbl = osm_physp_get_mod_pkey_tbl( peer );
>>>>  num_of_blocks = osm_pkey_tbl_get_num_blocks( p_pkey_tbl );
>>>>-   if ( num_of_blocks > osm_pkey_tbl_get_num_blocks( p_peer_pkey_tbl ) )
>>>>-      num_of_blocks = osm_pkey_tbl_get_num_blocks( p_peer_pkey_tbl );
>>>>+	peer_max_blocks = pkey_mgr_get_physp_max_blocks( p_subn, peer );
>>>>+	if (peer_max_blocks < p_pkey_tbl->used_blocks)
>>>>+	{
>>>>+		osm_log( p_log, OSM_LOG_ERROR,
>>>>+					"pkey_mgr_update_peer_port: ERR 
>>>>0508: "
>>>>+					"not enough entries (%u < %u) on 
>>>>switch 0x%016" PRIx64
>>>>+					" port %u\n",
>>>>+					peer_max_blocks, num_of_blocks,
>>>>+					cl_ntoh64( osm_node_get_node_guid( 
>>>>p_node ) ),
>>>>+					osm_physp_get_port_num( peer ) );
>>>>+		return FALSE;
>>>
>>>
>>>Do you think it is the best way, just to skip update - partitions are
>>>enforced already on the switch. May be better to truncate pkey tables
>>>in order to meet peer's capabilities?
>>
>>You are right about that - Its a bug!
>>I think the best approach here is to turn off the enforcement on the switch.
>>If we truncate the table we actually impact connectivity of the fabric.
>>I prefer a softer approach - an error in the log.
> 
> 
> Yes this should be good way to handle this.
> 
> 
>>>
>>>>+	}
>>>>
>>>>-   for ( block_index = 0; block_index < num_of_blocks; block_index++ )
>>>>+	p_peer_pkey_tbl->used_blocks = p_pkey_tbl->used_blocks;
>>>>+	for ( block_index = 0; block_index < p_pkey_tbl->used_blocks; 
>>>>block_index++)
>>>>  {
>>>>     block = osm_pkey_tbl_new_block_get( p_pkey_tbl, block_index );
>>>>     peer_block = osm_pkey_tbl_block_get( p_peer_pkey_tbl, block_index );
>>>>     if ( memcmp( peer_block, block, sizeof( *peer_block ) ) )
>>>>     {
>>>>+			osm_pkey_tbl_set(p_peer_pkey_tbl, block_index, 
>>>>block);
>>>
>>>
>>>Why this (osm_pkey_tbl_set())? This will be called by receiver.
>>
>>Same as the above note about updating the map
>>I wanted to avoid to wait for the GetResp.
>>I think it is a mistake and we can actually remove it.
> 
> 
> Agree.
> 
> Sasha.