[ofa-general] Re: [PATCH] opensm/opensm/osm_state_mgr.c: Add check for valid physical port before using pointer.

Ira Weiny weiny2 at llnl.gov
Tue Nov 18 14:06:08 PST 2008


I am not sure this will fix my bug.

The stack trace in my bug ended with:

   #0  osm_vendor_get (h_bind=0x0, mad_size=256, p_vw=0x69bbe8) at

The h_bind was being extracted from the osm_physp_t object.  Would this fix
ensure that the h_bind pointer was valid in the osm_physp_t object returned?

I used the "osm_physp_is_valid" function because the port_guid in osm_physp_t
object was only set after port_info returned valid data which would also ensure
that h_bind was set up correctly.  That happens through the call path:

   osm_pi_rcv_process->osm_physp_init->osm_dr_path_init

Currently osm_node_t->node_info is set when __osm_ni_rcv_process_new calls
osm_node_new.

osm_node_new calls osm_node_init_physp->osm_physp_init->osm_dr_path_init; but
only on the portnum which in the NodeInfo SMP.  Perhaps osm_physp_init needs to
be called again as in this patch:


diff --git a/opensm/opensm/osm_node_info_rcv.c b/opensm/opensm/osm_node_info_rcv.c
index 20b16d1..5749a66 100644
--- a/opensm/opensm/osm_node_info_rcv.c
+++ b/opensm/opensm/osm_node_info_rcv.c
@@ -785,6 +785,9 @@ __osm_ni_rcv_process_existing(IN osm_sm_t * sm,
                break;
        }
 
+       p_node->node_info = *p_ni;
+       osm_node_init_physp(p_node, p_madw);
+
        __osm_ni_rcv_set_links(sm, p_node, port_num, p_ni_context);
 
        OSM_LOG_EXIT(sm->p_log);


Thoughts?
Ira



On Tue, 18 Nov 2008 14:30:00 +0200
Sasha Khapyorsky <sashak at voltaire.com> wrote:

> Hi,
> 
> On 20:54 Wed 12 Nov     , Sasha Khapyorsky wrote:
> > > 
> > > I was wondering if it would return invalid ports ever.  It would be easy for it
> > > to return only valid ports but perhaps that should be another function to
> > > preserve functionality?
> 
> Looked at this. Another problematic place where this function is used is
> osm_sa_link_record.c - there when "any" port becomes invalid (which is
> possible case) it starts an endless recursion :(. So we will need to fix
> the function behavior.
> 
> One option is to scan all ports and to return valid one. Another solution
> would be to update locally stored in OpenSM NodeInfo on each receive
> (something like below). Then osm_node_get_any_physp_ptr() will return a
> port where this node was accessed last.
> 
> In this way it also could catch potential OtherLocalSetting changes (in
> NodeInfo, such as SystemImageGUID, etc.).
> 
> Could anybody see any downsides with such approach?
> 
> Sasha
> 
> 
> diff --git a/opossum/opensm/osm_node_info_rcv.c b/opensm/opensm/osm_node_info_rcv.c
> index 20b16d1..7d41cab 100644
> --- a/opensm/opensm/osm_node_info_rcv.c
> +++ b/opensm/opensm/osm_node_info_rcv.c
> @@ -785,6 +785,8 @@ __osm_ni_rcv_process_existing(IN osm_sm_t * sm,
>  		break;
>  	}
>  
> +	p_node->node_info = *p_ni;
> +
>  	__osm_ni_rcv_set_links(sm, p_node, port_num, p_ni_context);
>  
>  	OSM_LOG_EXIT(sm->p_log);



More information about the general mailing list