[openib-general] [PATCH] osm: partition manager force policy
Eitan Zahavi
eitan at mellanox.co.il
Sat Jun 17 12:36:40 PDT 2006
Sasha Khapyorsky wrote:
I'm working on the changes below. I will send them all as one patch
EZ
> Hi Eitan,
>
> On 15:19 Thu 15 Jun , Eitan Zahavi wrote:
>
>>>>+/*
>>>>+* PARAMETERS
>>>>+* p_physp
>>>>+* [in] Pointer to an osm_physp_t object.
>>>>+*
>>>>+* RETURN VALUES
>>>>+* The pointer to the P_Key table object.
>>>>+*
>>>>+* NOTES
>>>>+*
>>>>+* SEE ALSO
>>>>+* Port, Physical Port
>>>>+*********/
>>>>+
>>>
>>>
>>>Is not this simpler to remove 'const' from existing
>>>osm_physp_get_pkey_tbl() function instead of using new one?
>>
>>There are plenty of const functions using this function internally
>>so I would have need to fix them too.
>
>
> You are right. Maybe separate patch for this?
>
I think it is preferable to keep the const function.
>
>>>>@@ -118,14 +121,29 @@ void osm_pkey_tbl_sync_new_blocks(
>>>> p_block = cl_ptr_vector_get(&p_pkey_tbl->blocks, b);
>>>> if ( b < new_blocks )
>>>> p_new_block = cl_ptr_vector_get(&p_pkey_tbl->new_blocks, b);
>>>>- else {
>>>>+ else
>>>>+ {
>>>> p_new_block = (ib_pkey_table_t *)malloc(sizeof(*p_new_block));
>>>> if (!p_new_block)
>>>> break;
>>>>+ cl_ptr_vector_set(&((osm_pkey_tbl_t
>>>>*)p_pkey_tbl)->new_blocks, + b,
>>>>p_new_block);
>>>>+ }
>>>>+
>>>> memset(p_new_block, 0, sizeof(*p_new_block));
>>>>- cl_ptr_vector_set(&((osm_pkey_tbl_t *)p_pkey_tbl)->new_blocks, b,
>>>>p_new_block);
>>>> }
>>>>- memcpy(p_new_block, p_block, sizeof(*p_new_block));
>>>>+}
>>>
>>>
>>>You changed this function so it does not do any sync anymore. Should
>>>function name be changed too?
>>
>>Yes correct I will change it. Is a better name:
>>osm_pkey_tbl_init_new_blocks ?
>
>
> Great name.
>
>
>>>>+ to show that on the "old" blocks
>>>>+*/
>>>>+int
>>>>+osm_pkey_tbl_set_new_entry(
>>>>+ IN osm_pkey_tbl_t *p_pkey_tbl,
>>>>+ IN uint16_t block_idx,
>>>>+ IN uint8_t pkey_idx,
>>>>+ IN uint16_t pkey)
>>>>+{
>>>>+ ib_pkey_table_t *p_old_block;
>>>>+ ib_pkey_table_t *p_new_block;
>>>>+
>>>>+ if (osm_pkey_tbl_make_block_pair(
>>>>+ p_pkey_tbl, block_idx, &p_old_block, &p_new_block))
>>>>+ return 1;
>>>>+
>>>>+ cl_map_insert( &p_pkey_tbl->keys,
>>>>+ ib_pkey_get_base(pkey),
>>>>+
>>>>&(p_old_block->pkey_entry[pkey_idx]));
>>>
>>>
>>>Here you map potentially empty pkey entry. Why? "old block" will be
>>>remapped anyway on pkey receiving.
>>
>>The reason I did this was that if the GetResp will fail I still want to
>>represent
>>the settings in the map.But actually it might be better not to do that so
>>next
>>time we run we will not find it without a GetResp.
>
>
> Agree.
>
>
>>>>+ IN uint16_t *p_pkey,
>>>>+ OUT uint32_t *p_block_idx,
>>>>+ OUT uint8_t *p_pkey_index)
>>>>+{
>>>>+ uint32_t num_of_blocks;
>>>>+ uint32_t block_index;
>>>>+ ib_pkey_table_t *block;
>>>>+
>>>>+ CL_ASSERT( p_pkey_tbl );
>>>>+ CL_ASSERT( p_block_idx != NULL );
>>>>+ CL_ASSERT( p_pkey_idx != NULL );
>>>
>>>
>>>Why last two CL_ASSERTs? What should be problem with uninitialized
>>>pointers here?
>>>
>>
>>These are the outputs of the function. It does not make sense to call the
>>functions with
>>null output pointers (calling by ref) . Anyway instead of putting the check
>>in the free build
>>I used an assert
>
>
> I see. Actually I've overlooked that addresses and not values are
> checked. Please ignore this comment.
>
>
>>>>+
>>>>+ p_pkey_tbl = osm_physp_get_mod_pkey_tbl( p_physp );
>>>>+ if (! p_pkey_tbl)
>>>
>>> ^^^^^^^^^^^^^
>>>Is it possible?
>>
>>Yes it is ! I run into it during testing. The port did not have any pkey
>>table.
>
>
> static inline osm_pkey_tbl_t *
> osm_physp_get_mod_pkey_tbl( IN osm_physp_t* const p_physp )
> {
> ...
> return( &p_physp->pkeys );
> };
>
> This returns the address of physp's pkeys field. Right?
> Then if ( &p_physp->pkeys == NULL ) p_physp pointer should be equal to
> unsigned equivalent of -(offset of pkey field in physp struct).
Correct. I will remove the check.
>
>
>>>>+ "Fail to allocate new pending pkey
>>>>entry for node "
>>>>+ "0x%016" PRIx64 " port %u\n",
>>>>+ cl_ntoh64( osm_node_get_node_guid(
>>>>p_node ) ),
>>>>+ osm_physp_get_port_num( p_physp ) );
>>>>+ return;
>>>>+ }
>>>>+ p_pending->pkey = pkey;
>>>>+ p_orig_pkey = cl_map_get( &p_pkey_tbl->keys, ib_pkey_get_base( pkey
>>>>) );
>>>>+ if ( !p_orig_pkey ||
>>>>+ (ib_pkey_get_base(*p_orig_pkey) != ib_pkey_get_base(pkey)
>>>>))
>>>
>>>
>>>There the cases of new pkey and updated pkey membership is mixed. Why?
>>
>>I am not following your question.
>>The specific case I am trying to catch is the one that for some reason the
>>map points to
>>a pkey entry that was modified somehow and is different then the one you
>>would expect by
>>the map.
>
>
> Didn't understand it at first pass, now it is clearer.
>
> If pkey entry was modified somehow (how? bugs?), the assumption is that
> mapping still be valid? Then it is not new entry (or we will change
> pkey's index in the real table).
>
PKey table mismatch between the block and map should never happen.
I will remove the check and replace that with an ASSERT so I catch the bug if we hit it.
>
>>>>+ {
>>>>+ p_pending->is_new = TRUE;
>>>>+ cl_qlist_insert_tail(&p_pkey_tbl->pending,
>>>>(cl_list_item_t*)p_pending);
>>>>+ stat = "inserted";
>>>>+ }
>>>>+ else
>>>>+ {
>>>>+ p_pending->is_new = FALSE;
>>>>+ if (osm_pkey_tbl_get_block_and_idx(p_pkey_tbl, p_orig_pkey,
>>>>+
>>>>&p_pending->block, &p_pending->index))
>>>
>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>AFAIK in this function there were CL_ASSERTs which check for uinitialized
>>>pointers.
>>
>>True. So the asserts are not required in this case.
>
>
> Up to you. Actually this my comment may be ignored, as stated above I
> didn't read this correctly.
>
>
>>>
>>>>+ {
>>>>+ osm_log( p_log, OSM_LOG_ERROR,
>>>>+ "pkey_mgr_process_physical_port:
>>>>ERR 0503: "
>>>>+ "Fail to obtain P_Key 0x%04x
>>>>block and index for node "
>>>>+ "0x%016" PRIx64 " port %u\n",
>>>>+ cl_ntoh64(
>>>>osm_node_get_node_guid( p_node ) ),
>>>>+ osm_physp_get_port_num(
>>>>p_physp ) );
>>>>+ return;
>>>>+ }
>>>>+ cl_qlist_insert_head(&p_pkey_tbl->pending,
>>>>(cl_list_item_t*)p_pending);
>>>>+ stat = "updated";
>>>
>>>
>>>Is it will be updated? It is likely "already there" case. No?
>>>
>>>Also in this case you can already put the pkey in new_block instead of
>>>holding it in pending list. Then later you will only need to add new
>>>pkeys. This may simplify the flow and even save some mem.
>>
>>True but in my mind it does not simplify - on the contrary it makes the
>>partition between
>>populating each port pending list and actually setting the pkey tables
>>mixed.
>
>
> I meant new_block filling, not actual setting. You will be able to
> remove whole if { } else { } flow, as well as is_new, block and index
> fields from 'pending' structure (actually only pkey value itself will
> matter) - is it not nice simplification?
I still prefer the clear staging: append to list when scanning the partitions and
filling in the tables when looping on all ports.
>
>
>>I do not think the memory impact deserves this mix of staging
>>
>>
>>>
>
>>>>+ max_num_of_blocks = pkey_mgr_get_physp_max_blocks( p_req->p_subn,
>>>>p_physp );
>>>>+ if ( p_pkey_tbl->max_blocks > max_num_of_blocks )
>>>> {
>>>>- block = osm_pkey_tbl_new_block_get( p_pkey_tbl, block_index );
>>>>- for ( i = 0; i < IB_NUM_PKEY_ELEMENTS_IN_BLOCK; i++ )
>>>>+ osm_log( p_log, OSM_LOG_INFO,
>>>>+ "pkey_mgr_update_port: "
>>>>+ "Max number of blocks reduced from
>>>>%u to %u " + "for node 0x%016" PRIx64 "
>>>>port %u\n",
>>>>+ p_pkey_tbl->max_blocks,
>>>>max_num_of_blocks,
>>>>+ cl_ntoh64( osm_node_get_node_guid(
>>>>p_node ) ),
>>>>+ osm_physp_get_port_num( p_physp ) );
>>>>+ }
>>>>+ p_pkey_tbl->max_blocks = max_num_of_blocks;
>>>>+
>>>>+ osm_pkey_tbl_sync_new_blocks( p_pkey_tbl );
>>>>+ cl_map_remove_all( &p_pkey_tbl->keys );
>>>
>>>
>>>What is the reason to drop map here? AFAIK it will be reinitialized later
>>>anyway when pkey blocks will be received.
>>
>>What if it is not received?
>
>
> Then we will have unreliable data there.
>
> Maybe I know why you wanted this - this is part of "use pkey tables
> before sending/receiving to/from ports" idea?
>
>
>>>>@@ -255,24 +443,36 @@ pkey_mgr_update_peer_port(
>>>> if (enforce == FALSE)
>>>> return FALSE;
>>>>
>>>>- p_pkey_tbl = osm_physp_get_pkey_tbl( p );
>>>>- p_peer_pkey_tbl = osm_physp_get_pkey_tbl( peer );
>>>>+ p_pkey_tbl = osm_physp_get_pkey_tbl( p_physp );
>>>>+ p_peer_pkey_tbl = osm_physp_get_mod_pkey_tbl( peer );
>>>> num_of_blocks = osm_pkey_tbl_get_num_blocks( p_pkey_tbl );
>>>>- if ( num_of_blocks > osm_pkey_tbl_get_num_blocks( p_peer_pkey_tbl ) )
>>>>- num_of_blocks = osm_pkey_tbl_get_num_blocks( p_peer_pkey_tbl );
>>>>+ peer_max_blocks = pkey_mgr_get_physp_max_blocks( p_subn, peer );
>>>>+ if (peer_max_blocks < p_pkey_tbl->used_blocks)
>>>>+ {
>>>>+ osm_log( p_log, OSM_LOG_ERROR,
>>>>+ "pkey_mgr_update_peer_port: ERR
>>>>0508: "
>>>>+ "not enough entries (%u < %u) on
>>>>switch 0x%016" PRIx64
>>>>+ " port %u\n",
>>>>+ peer_max_blocks, num_of_blocks,
>>>>+ cl_ntoh64( osm_node_get_node_guid(
>>>>p_node ) ),
>>>>+ osm_physp_get_port_num( peer ) );
>>>>+ return FALSE;
>>>
>>>
>>>Do you think it is the best way, just to skip update - partitions are
>>>enforced already on the switch. May be better to truncate pkey tables
>>>in order to meet peer's capabilities?
>>
>>You are right about that - Its a bug!
>>I think the best approach here is to turn off the enforcement on the switch.
>>If we truncate the table we actually impact connectivity of the fabric.
>>I prefer a softer approach - an error in the log.
>
>
> Yes this should be good way to handle this.
>
>
>>>
>>>>+ }
>>>>
>>>>- for ( block_index = 0; block_index < num_of_blocks; block_index++ )
>>>>+ p_peer_pkey_tbl->used_blocks = p_pkey_tbl->used_blocks;
>>>>+ for ( block_index = 0; block_index < p_pkey_tbl->used_blocks;
>>>>block_index++)
>>>> {
>>>> block = osm_pkey_tbl_new_block_get( p_pkey_tbl, block_index );
>>>> peer_block = osm_pkey_tbl_block_get( p_peer_pkey_tbl, block_index );
>>>> if ( memcmp( peer_block, block, sizeof( *peer_block ) ) )
>>>> {
>>>>+ osm_pkey_tbl_set(p_peer_pkey_tbl, block_index,
>>>>block);
>>>
>>>
>>>Why this (osm_pkey_tbl_set())? This will be called by receiver.
>>
>>Same as the above note about updating the map
>>I wanted to avoid to wait for the GetResp.
>>I think it is a mistake and we can actually remove it.
>
>
> Agree.
>
> Sasha.
More information about the general
mailing list