[ofa-general] Re: [PATCH] opensm/osm_node_info_rcv.c: create physp for the newly discovered port of the known node

Yevgeny Kliteynik kliteyn at dev.mellanox.co.il
Wed Feb 18 13:31:25 PST 2009


Sasha Khapyorsky wrote:
> On 14:41 Tue 17 Feb     , Yevgeny Kliteynik wrote:
>> Hi Sasha,
>>
>> This patch fixes bugzilla issue #1515:
>>
>> Topology:
>>                  |---------------|
>>                  |      SW2      |
>>                  |---------------|
>>                    |x |y    |z |v
>>               |----|  |     |  |----|
>>               |       |     |       |
>>               |  |----|     |----|  |
>>               |  |               |  |
>>              a| b|              c| d|
>>       |---------------|     |---------------|
>>       |       SW1     |     |     SW3       |
>>       |---------------|     |---------------|
>>           |                             |
>>           |                             |
>>        HCA with SM                      HCA
>>
>> During the discovery:
>>
>> SM sends NodeInfo request to SW1
>> SM sends NodeInfo request to SW2 through link a->x
>> SM discovers new node SW2:
>>   - updates DR to SW2 to go through link a->x
>>   - creates physp x
>> SM sends NodeInfo request to SW2 through link b->y
>> SM discovers a known node SW2
>>   - DOES NOT create physp y
>>   - updates DR to SW2 to go through link b->y
>>
>> From now on, the DR to SW2 is going through port y, so OpenSM won't deal with
>> port y any more, leaving it uninitialized (no physp object for this port).
>>
>> The fix is to create physp for the newly discovered port of the known
>> switch node, same way as it is done for HCAs.
>> I also added one log message for the case that showed the problem - when
>> one of the link sides is uninitialized (no valid ports check). Perhaps
>> this log message should be an error message instead?
>>
>> Signed-off-by: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
>> ---
>>  opensm/opensm/osm_node_info_rcv.c |   24 +++++++++++++++++++++++-
>>  1 files changed, 23 insertions(+), 1 deletions(-)
>>
>> diff --git a/opensm/opensm/osm_node_info_rcv.c b/opensm/opensm/osm_node_info_rcv.c
>> index c52c0d5..7da3103 100644
>> --- a/opensm/opensm/osm_node_info_rcv.c
>> +++ b/opensm/opensm/osm_node_info_rcv.c
>> @@ -164,8 +164,12 @@ __osm_ni_rcv_set_links(IN osm_sm_t * sm,
>>  	 */
>>  	if (!osm_node_link_has_valid_ports(p_node, port_num,
>>  					   p_neighbor_node,
>> -					   p_ni_context->port_num))
>> +					   p_ni_context->port_num)) {
> 
> Actually if port is initialized unconditionally on NodeInfo receiving
> this case becomes impossible. No?
> 
> If yes, we probably need to put CL_ASSERT() there instead of run-time
> check.

Good point.
I'll repost the patch when we finish discussing it.

-- Yevgeny

> Sasha
> 
>> +		OSM_LOG(sm->p_log, OSM_LOG_DEBUG,
>> +			"Link at node 0x%" PRIx64 ", port %u - no valid ports\n",
>> +			cl_ntoh64(osm_node_get_node_guid(p_node)), port_num);
>>  		goto _exit;
>> +	}
>>
>>  	if (osm_node_link_exists(p_node, port_num,
>>  				 p_neighbor_node, p_ni_context->port_num)) {
>> @@ -537,8 +541,26 @@ __osm_ni_rcv_process_existing_switch(IN osm_sm_t * sm,
>>  				     IN osm_node_t * const p_node,
>>  				     IN const osm_madw_t * const p_madw)
>>  {
>> +
>> +	ib_smp_t *p_smp;
>> +	ib_node_info_t *p_ni;
>> +	uint8_t port_num;
>> +
>>  	OSM_LOG_ENTER(sm->p_log);
>>
>> +	p_smp = osm_madw_get_smp_ptr(p_madw);
>> +	p_ni = (ib_node_info_t *) ib_smp_get_payload_ptr(p_smp);
>> +	port_num = ib_node_info_get_local_port_num(p_ni);
>> +
>> +	if (!osm_node_get_physp_ptr(p_node, port_num)) {
>> +		OSM_LOG(sm->p_log, OSM_LOG_DEBUG,
>> +			"Creating physp for node GUID:0x%"
>> +			PRIx64 ", port %u\n",
>> +			cl_ntoh64(osm_node_get_node_guid(p_node)),
>> +			port_num);
>> +		osm_node_init_physp(p_node, p_madw);
>> +	}
>> +
>>  	/*
>>  	   If this switch has already been probed during this sweep,
>>  	   then don't bother reprobing it.
>> -- 
>> 1.5.1.4
>>
> 




More information about the general mailing list