[openib-general] Re: [PATCH] osm: osm_vendor_umad osm_vendor_get_all_port_attr bug

Eitan Zahavi eitan at mellanox.co.il
Tue Aug 30 11:52:06 PDT 2005


Hal Rosenstock wrote:
> Hi Eitan,
> 
> On Sun, 2005-08-28 at 10:43, Eitan Zahavi wrote: 
> 
>>I agree that the index 0 of the guid,lids and the new linkstates arrays
>>should be reserved for the default port. In the loop the index j is used
>>to loop over all ports 0 .. N of the HCA's. It is clear that for HCA's
>>port 0 will be skipped. However, since the current code does not advance
>>the lid and linkstate accordingly the place for the port 0 will not be
>>kept empty for the port 0.
>>
>>Current code:
>>for (j = 0; j <= ca.numports; j++) {
>>   if (ca.ports[j]) {
>>      *p_lid = ca.ports[j]->base_lid;
>>      *p_linkstates = ca.ports[j]->state;
>>      p_lid++;
>>      p_linkstates++;
>>   }
>>}
>>Should be:
>>for (j = 0; j <= ca.numports; j++) {
>>   if (ca.ports[j]) {
>>      *p_lid = ca.ports[j]->base_lid;
>>      *p_linkstates = ca.ports[j]->state;
>>   }
>>   /* as j advance even if the port is not valid, so should the
>>      lid and state pointer */
>>   p_lid++;
>>   p_linkstates++;
>>}
>>
>>As I could not convince you with the above explanations in my previous
>>mail I have written the following simple program to test the pre-and
>>post patch effect:
>>
>>/*
>>   test program for dumping osm_vendor_get_all_port_attr results
>>*/
>>
>>#include "stdio.h"
>>#include <stdlib.h>
>>#include <unistd.h>
>>#include <vendor/osm_vendor_api.h>
>>#include <opensm/osm_opensm.h>
>>
>>#include <common.h>
>>#define GUID_ARRAY_SIZE 64
>>int
>>main() {
>>   osm_vendor_t vendor;
>>   osm_log_t osm_log;
>>   ib_api_status_t status;
>>   uint32_t num_ports = GUID_ARRAY_SIZE;
>>   ib_port_attr_t attr_array[GUID_ARRAY_SIZE];
>>   int i;
>>
>>   osm_log_construct(&osm_log);
>>   osm_log_init(&osm_log, TRUE, 0xff, "/tmp/test_vendor.log");
>>
>>   osm_vendor_init(&vendor, &osm_log, 1000);
>>
>>   status = osm_vendor_get_all_port_attr(&vendor, attr_array, &num_ports );
>>   if ( status != IB_SUCCESS )
>>   {
>>     printf( "\nError from osm_vendor_get_all_port_attr (%x)\n", status);
>>     return;
>>   }
>>
>>   printf("\nListing GUIDs:\n");
>>   for (i = 0; i < num_ports; i++) {
>>     printf("Port %i:0x%"PRIx64" lid:0x%04x state:%x\n",
>>            i,
>>            cl_hton64(attr_array[i].port_guid),
>>            cl_ntoh16(attr_array[i].lid),
>>            attr_array[i].link_state
>>            );
>>   }
>>
>>   exit(0);
>>}
>>
>>Without the above change I get:
>>Listing GUIDs:
>>Port 0:0xd9dffffff3d55 lid:0x0300 state:4
>>Port 1:0xd9dffffff3d55 lid:0x0400 state:4
>>Port 2:0xd9dffffff3d56 lid:0x0000 state:0
>>
>>After the simple change I get:
>>Listing GUIDs:
>>Port 0:0xd9dffffff3d55 lid:0x0300 state:4
>>Port 1:0xd9dffffff3d55 lid:0x0300 state:4
>>Port 2:0xd9dffffff3d56 lid:0x0400 state:4
>>
>>So as you can see - without the fix the lid of port 2 is presented as 
>>the lid of port 1...
> 
> 
> I understand the difference in the code and think the difference
> perhaps relates to either a lack of clarity or confusion with the API as
> follows: I don't see where it is defined what the index into the port
> array means. I think we have 2 different interpretations and this
> relates to how opensm/main.c  handles the results of calling this
> routine. 
I do not follow you. Do you suggest it is OK the port at index 1 will have
the guid of port 1 but the lid and state of port 2?

I did not complain about what port is reported at what index:
Just about the mismatch of guids and lids. Please see above.

> 
> So the patch is incomplete although perhaps correct depending on the
> interpretation. I'm not adverse to changing this as you indicate. I
> would like to resolve this before embarking on the 1.8.0 merge.
> 
> Also, since we are in this area, I don't think switch port 0 would be
> handled correctly by this code either.
> 
> 
>>I guess you use ibstatus in your mail. Well ibstatus uses its own code
>>so it shows the correct info anyway. 
> 
> 
> That was just to show that the port states corresponded to the ones
> shown by osm_vendor_get_all_port_attr with the print statements. Nothing
> else.
> 
> -- Hal
> 
> 
>>In my case that is:
>>swlab223:/tmp/bld/libvendor>ibstatus
>>Infiniband device 'mthca0' port 1 status:
>>         default gid:     fe80:0000:0000:0000:000d:9dff:ffff:3d55
>>         base lid:        0x3
>>         sm lid:          0x1
>>         state:           4: ACTIVE
>>         phys state:      5: LinkUp
>>         rate:            10 Gb/sec (4X)
>>
>>Infiniband device 'mthca0' port 2 status:
>>         default gid:     fe80:0000:0000:0000:000d:9dff:ffff:3d56
>>         base lid:        0x4
>>         sm lid:          0x1
>>         state:           4: ACTIVE
>>         phys state:      5: LinkUp
>>         rate:            10 Gb/sec (4X)
>>
>>
>>
>>Hal Rosenstock wrote:
>>
>>>Hi Eitan,
>>>
>>>On Sun, 2005-08-21 at 03:32, Eitan Zahavi wrote:
>>>
>>>
>>>>osm_vendor_get_all_port_attr returns incorrect LID and state for 
>>>>device ports. This bug was caused by the fact that if a device port
>>>>was skipped due to that fact it does not exist (HCA port 0). The 
>>>>lid and state pointers used as indexes into their corresponding
>>>>return value arrays were not advancing to the next port index.
>>>>
>>>>So the return for a single HCA was mixing LID and state for the first
>>>>port and displayed non initialized memory for the second port.
>>>
>>>
>>>The array is not filled in as you claim. Port 0 does not take a slot on
>>>an HCA. This looks fine to me as is (I added some print statements in
>>>that loop as follows):
>>>
>>>osm_vendor_get_all_port_attr: port 0
>>>osm_vendor_get_all_port_attr: port 1
>>>osm_vendor_get_all_port_attr: port 1 lid 1 state 4
>>>osm_vendor_get_all_port_attr: port 2
>>>osm_vendor_get_all_port_attr: port 2 lid 0 state 1
>>>
>>>Port 0 is skipped; port 1 is LID 1 and active; port 2 is not plugged in
>>>and is down:
>>>
>>>        Port 1:
>>>                State: Active
>>>                Physical state: LinkUp
>>>                Rate: 2
>>>                Base lid: 1
>>>                LMC: 0
>>>                SM lid: 1
>>>                Capability mask: 0x00500a68
>>>                Port GUID: 0x0008f10403960559
>>>        Port 2:
>>>                State: Down
>>>                Physical state: Polling
>>>                Rate: 2
>>>                Base lid: 0
>>>                LMC: 0
>>>                SM lid: 0
>>>                Capability mask: 0x00500a68
>>>                Port GUID: 0x0008f1040396055a
>>>
>>>-- Hal
>>




More information about the general mailing list