[openib-general] [PATCH] osm: partition manager force policy

Hal Rosenstock halr at voltaire.com
Thu Jun 15 05:54:48 PDT 2006


On Thu, 2006-06-15 at 08:19, Eitan Zahavi wrote:
> >>+	p_pkey_tbl = osm_physp_get_mod_pkey_tbl( p_physp );
> >>+	if (! p_pkey_tbl)
> > 
> >            ^^^^^^^^^^^^^
> > Is it possible?
> Yes it is ! I run into it during testing. The port did not have any pkey table.

PKey tables are optional and predicated on NodeInfo:PartitionCap for
endports which has a minimum of 1 and SwitchInfo:PartitionEnforcementCap
for switch external (physical) ports which can be 0.

Is this routine used for an endport (CA, router, switch management
port), switch external port, or both ?

> >>@@ -217,21 +403,23 @@ pkey_mgr_update_peer_port(
> >>    const osm_port_t * const p_port,
> >>    boolean_t enforce )
> >> {
> >>-   osm_physp_t *p, *peer;
> >>+	osm_physp_t *p_physp, *peer;
> >>    osm_node_t *p_node;
> >>    ib_pkey_table_t *block, *peer_block;
> >>-   const osm_pkey_tbl_t *p_pkey_tbl, *p_peer_pkey_tbl;
> >>+	const osm_pkey_tbl_t *p_pkey_tbl;
> >>+	osm_pkey_tbl_t *p_peer_pkey_tbl;
> >>    osm_switch_t *p_sw;
> >>    ib_switch_info_t *p_si;
> >>    uint16_t block_index;
> >>    uint16_t num_of_blocks;
> >>+	uint16_t peer_max_blocks;
> >>    ib_api_status_t status = IB_SUCCESS;
> >>    boolean_t ret_val = FALSE;
> >> 
> >>-   p = osm_port_get_default_phys_ptr( p_port );
> >>-   if ( !osm_physp_is_valid( p ) )
> >>+	p_physp = osm_port_get_default_phys_ptr( p_port );
> >>+	if ( !osm_physp_is_valid( p_physp ) )
> >>       return FALSE;
> >>-   peer = osm_physp_get_remote( p );
> >>+	peer = osm_physp_get_remote( p_physp );
> >>    if ( !peer || !osm_physp_is_valid( peer ) )
> >>       return FALSE;
> >>    p_node = osm_physp_get_node_ptr( peer );
> >>@@ -245,7 +433,7 @@ pkey_mgr_update_peer_port(
> >>    if (pkey_mgr_enforce_partition( p_req, peer, enforce ) != IB_SUCCESS)
> >>    {
> >>       osm_log( p_log, OSM_LOG_ERROR,
> >>-               "pkey_mgr_update_peer_port: ERR 0502: "
> >>+					"pkey_mgr_update_peer_port: ERR 0507: "
> >>                "pkey_mgr_enforce_partition() failed to update "
> >>                "node 0x%016" PRIx64 " port %u\n",
> >>                cl_ntoh64( osm_node_get_node_guid( p_node ) ),
> >>@@ -255,24 +443,36 @@ pkey_mgr_update_peer_port(
> >>    if (enforce == FALSE)
> >>       return FALSE;
> >> 
> >>-   p_pkey_tbl = osm_physp_get_pkey_tbl( p );
> >>-   p_peer_pkey_tbl = osm_physp_get_pkey_tbl( peer );
> >>+	p_pkey_tbl = osm_physp_get_pkey_tbl( p_physp );
> >>+	p_peer_pkey_tbl = osm_physp_get_mod_pkey_tbl( peer );
> >>    num_of_blocks = osm_pkey_tbl_get_num_blocks( p_pkey_tbl );
> >>-   if ( num_of_blocks > osm_pkey_tbl_get_num_blocks( p_peer_pkey_tbl ) )
> >>-      num_of_blocks = osm_pkey_tbl_get_num_blocks( p_peer_pkey_tbl );
> >>+	peer_max_blocks = pkey_mgr_get_physp_max_blocks( p_subn, peer );
> >>+	if (peer_max_blocks < p_pkey_tbl->used_blocks)
> >>+	{
> >>+		osm_log( p_log, OSM_LOG_ERROR,
> >>+					"pkey_mgr_update_peer_port: ERR 0508: "
> >>+					"not enough entries (%u < %u) on switch 0x%016" PRIx64
> >>+					" port %u\n",
> >>+					peer_max_blocks, num_of_blocks,
> >>+					cl_ntoh64( osm_node_get_node_guid( p_node ) ),
> >>+					osm_physp_get_port_num( peer ) );
> >>+		return FALSE;
> > 
> > 
> > Do you think it is the best way, just to skip update - partitions are
> > enforced already on the switch. May be better to truncate pkey tables
> > in order to meet peer's capabilities?
> You are right about that - Its a bug!
> I think the best approach here is to turn off the enforcement on the switch.
> If we truncate the table we actually impact connectivity of the fabric.
> I prefer a softer approach - an error in the log.

Makes sense to me. It is better to give the administrator as close to
what he wants and not punish him for something like this but warn him
that his policy is weakened.

In addition to an error in the log, one should also go to OSM_LOG_SYS as
well so it might be noticed without checking the log.

-- Hal






More information about the general mailing list