[ofa-general] Re: [PATCH] opensm/osm_node_info_rcv.c: create physp for the newly discovered port of the known node

Yevgeny Kliteynik kliteyn at dev.mellanox.co.il
Wed Feb 18 01:31:07 PST 2009


Hi Sasha,

Sasha Khapyorsky wrote:
> Hi Yevgeny,
> 
> On 14:41 Tue 17 Feb     , Yevgeny Kliteynik wrote:
>> This patch fixes bugzilla issue #1515:
>>
>> Topology:
>>                  |---------------|
>>                  |      SW2      |
>>                  |---------------|
>>                    |x |y    |z |v
>>               |----|  |     |  |----|
>>               |       |     |       |
>>               |  |----|     |----|  |
>>               |  |               |  |
>>              a| b|              c| d|
>>       |---------------|     |---------------|
>>       |       SW1     |     |     SW3       |
>>       |---------------|     |---------------|
>>           |                             |
>>           |                             |
>>        HCA with SM                      HCA
>>
>> During the discovery:
>>
>> SM sends NodeInfo request to SW1
>> SM sends NodeInfo request to SW2 through link a->x
>> SM discovers new node SW2:
>>   - updates DR to SW2 to go through link a->x
>>   - creates physp x
> 
> And requests SwitchInfo from SW2, and on response sends PortInfo to all
> switch ports. PortInfo receiver will initialize all switch ports. Isn't
> it?

Links are created only by getting NodeInfo response. W/o the
fix, when SW1 gets NodeInfo from SW2 through link b->y, it
doesn't initialize physp for y, hence the link can't be created.
So the only chance for the link to be created is when
SW2 will send NodeInfo request to SW1 through link y->b.
But this isn't happening, because DR for SW2 is updated
to contain this link, so SM doesn't probe the remote side
of y to avoid loop.

BTW, thing happens with every other link that connects
same nodes. In the example above, link v<->d will be
missing as well.

-- Yevgeny

> Sasha
> 
>> SM sends NodeInfo request to SW2 through link b->y
>> SM discovers a known node SW2
>>   - DOES NOT create physp y
>>   - updates DR to SW2 to go through link b->y
>>
>> From now on, the DR to SW2 is going through port y, so OpenSM won't deal with
>> port y any more, leaving it uninitialized (no physp object for this port).
>>
>> The fix is to create physp for the newly discovered port of the known
>> switch node, same way as it is done for HCAs.
>> I also added one log message for the case that showed the problem - when
>> one of the link sides is uninitialized (no valid ports check). Perhaps
>> this log message should be an error message instead?
>>
>> Signed-off-by: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
>> ---
>>  opensm/opensm/osm_node_info_rcv.c |   24 +++++++++++++++++++++++-
>>  1 files changed, 23 insertions(+), 1 deletions(-)
>>
>> diff --git a/opensm/opensm/osm_node_info_rcv.c b/opensm/opensm/osm_node_info_rcv.c
>> index c52c0d5..7da3103 100644
>> --- a/opensm/opensm/osm_node_info_rcv.c
>> +++ b/opensm/opensm/osm_node_info_rcv.c
>> @@ -164,8 +164,12 @@ __osm_ni_rcv_set_links(IN osm_sm_t * sm,
>>  	 */
>>  	if (!osm_node_link_has_valid_ports(p_node, port_num,
>>  					   p_neighbor_node,
>> -					   p_ni_context->port_num))
>> +					   p_ni_context->port_num)) {
>> +		OSM_LOG(sm->p_log, OSM_LOG_DEBUG,
>> +			"Link at node 0x%" PRIx64 ", port %u - no valid ports\n",
>> +			cl_ntoh64(osm_node_get_node_guid(p_node)), port_num);
>>  		goto _exit;
>> +	}
>>
>>  	if (osm_node_link_exists(p_node, port_num,
>>  				 p_neighbor_node, p_ni_context->port_num)) {
>> @@ -537,8 +541,26 @@ __osm_ni_rcv_process_existing_switch(IN osm_sm_t * sm,
>>  				     IN osm_node_t * const p_node,
>>  				     IN const osm_madw_t * const p_madw)
>>  {
>> +
>> +	ib_smp_t *p_smp;
>> +	ib_node_info_t *p_ni;
>> +	uint8_t port_num;
>> +
>>  	OSM_LOG_ENTER(sm->p_log);
>>
>> +	p_smp = osm_madw_get_smp_ptr(p_madw);
>> +	p_ni = (ib_node_info_t *) ib_smp_get_payload_ptr(p_smp);
>> +	port_num = ib_node_info_get_local_port_num(p_ni);
>> +
>> +	if (!osm_node_get_physp_ptr(p_node, port_num)) {
>> +		OSM_LOG(sm->p_log, OSM_LOG_DEBUG,
>> +			"Creating physp for node GUID:0x%"
>> +			PRIx64 ", port %u\n",
>> +			cl_ntoh64(osm_node_get_node_guid(p_node)),
>> +			port_num);
>> +		osm_node_init_physp(p_node, p_madw);
>> +	}
>> +
>>  	/*
>>  	   If this switch has already been probed during this sweep,
>>  	   then don't bother reprobing it.
>> -- 
>> 1.5.1.4
>>
> 




More information about the general mailing list