[ofa-general] [PATCH] infiniband-diags: Fix IB network discovery from switch node.

Eli Dorfman (Voltaire) dorfman.eli at gmail.com
Wed Sep 30 01:33:55 PDT 2009


Ira Weiny wrote:
> On Tue, 29 Sep 2009 18:16:21 +0200
> "Eli Dorfman (Voltaire)" <dorfman.eli at gmail.com> wrote:
> 
>> Ira Weiny wrote:
>>> Eli,
>>>
>>> On Wed, 26 Aug 2009 17:37:30 +0300
>>> "Eli Dorfman (Voltaire)" <dorfman.eli at gmail.com> wrote:
>>>
>>>> Subject: [PATCH] Fix IB network discovery from switch node.
>>> Sorry for the late inquiry on this but what exactly was the bug here?
>> Sorry for the late response.
>> The problem is related to wrong discovery when running from the switch.
>> Without the patch ibnetdiscover finds only local switch
> 
> Ok I see.
> 
> [snip]
> 
>> I think that the problem is related to NodeInfo:LocalPort which is 0 in case of a switch.
>> I see that get_remote_node() sends direct route MAD to switch with path 0,0 and that fails (at least for Mellanox IS4 switch chips).
>> Another way to bypass this may be as follows:
>>
>> diff --git a/infiniband-diags/libibnetdisc/src/ibnetdisc.c b/infiniband-diags/libibnetdisc/src/ibnetdisc.c
>> index 1e93ff8..3dd0dc6 100644
>> --- a/infiniband-diags/libibnetdisc/src/ibnetdisc.c
>> +++ b/infiniband-diags/libibnetdisc/src/ibnetdisc.c
>> @@ -461,7 +461,7 @@ get_remote_node(struct ibnd_fabric *fabric, struct ibnd_node *node, struct ibnd_
>>  			!= IB_PORT_PHYS_STATE_LINKUP)
>>  		return -1;
>>  
>> -	if (extend_dpath(fabric, path, portnum) < 0)
>> +	if (portnum > 0 && extend_dpath(fabric, path, portnum) < 0)
>>  		return -1;
>>  
>>  	if (query_node(fabric, &node_buf, &port_buf, path)) {
>>
>>
>> Please check whether this is OK and I can send a new patch.
>>
> 
> This seems to fix my issue.  Here is a patch against master which works for
> me.  If you want to verify that would be great.

Verified this again and it works.
Sasha, please apply this patch.

Thanks,
Eli

> 
> Thanks for helping me out,
> Ira
> 
> From: Ira Weiny <weiny2 at llnl.gov>
> Date: Tue, 22 Sep 2009 11:08:28 -0700
> Subject: [PATCH] infiniband-diags/libibnetdisc/src/ibnetdisc.c: fix bug in single node processing.
> 
> 	Eli fixed an issue with running ibnetdiscover from a switch but it
> 	introduced a bug in processing a single switch:
> 
> 17:19:42 > ./iblinkinfo -S 0x000b8cffff00490c
> Switch 0x000b8cffff00490c MT47396 Infiniscale-III Mellanox Technologies:
> ...
>            8   11[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>             [  ] "" ( )
>            8   12[  ] ==( 4X 5.0 Gbps Active/  LinkUp)==>             [  ] "" ( )
>            8   13[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>             [  ] "" ( )
> ...
> 
> 	The port we "come in on" when discovering the switch is not reported properly.
> 
>    This patch, suggested by Eli, reverses Eli's patch and fixes his original
>    bug in a way which does not introduce the above issue.
> 
> Signed-off-by: Ira Weiny <weiny2 at llnl.gov>
> ---
>  infiniband-diags/libibnetdisc/src/ibnetdisc.c |   18 ++++++++----------
>  1 files changed, 8 insertions(+), 10 deletions(-)
> 
> diff --git a/infiniband-diags/libibnetdisc/src/ibnetdisc.c b/infiniband-diags/libibnetdisc/src/ibnetdisc.c
> index 97e369c..96f72c5 100644
> --- a/infiniband-diags/libibnetdisc/src/ibnetdisc.c
> +++ b/infiniband-diags/libibnetdisc/src/ibnetdisc.c
> @@ -506,7 +506,7 @@ static int get_remote_node(struct ibmad_port *ibmad_port,
>  	    != IB_PORT_PHYS_STATE_LINKUP)
>  		return 1;	/* positive == non-fatal error */
>  
> -	if (extend_dpath(ibmad_port, fabric, path, portnum) < 0)
> +	if (portnum > 0 && extend_dpath(ibmad_port, fabric, path, portnum) < 0)
>  		return -1;
>  
>  	if (query_node(ibmad_port, fabric, &node_buf, &port_buf, path)) {
> @@ -600,15 +600,13 @@ ibnd_fabric_t *ibnd_discover_fabric(struct ibmad_port * ibmad_port,
>  	if (!port)
>  		goto error;
>  
> -	if (node->type != IB_NODE_SWITCH) {
> -		rc = get_remote_node(ibmad_port, fabric, node, port, from,
> -				     mad_get_field(node->info, 0,
> -						   IB_NODE_LOCAL_PORT_F), 0);
> -		if (rc < 0)
> -			goto error;
> -		if (rc > 0)		/* non-fatal error, nothing more to be done */
> -			return ((ibnd_fabric_t *) fabric);
> -	}
> +	rc = get_remote_node(ibmad_port, fabric, node, port, from,
> +			     mad_get_field(node->info, 0,
> +					   IB_NODE_LOCAL_PORT_F), 0);
> +	if (rc < 0)
> +		goto error;
> +	if (rc > 0)		/* non-fatal error, nothing more to be done */
> +		return ((ibnd_fabric_t *) fabric);
>  
>  	for (dist = 0; dist <= max_hops; dist++) {
>  




More information about the general mailing list