[openib-general] Re: [PATCH] osm: osm_vendor_umad osm_vendor_get_all_port_attr bug

Eitan Zahavi eitan at mellanox.co.il
Sun Aug 28 07:43:00 PDT 2005


Hi Hal,

I agree that the index 0 of the guid,lids and the new linkstates arrays
should be reserved for the default port. In the loop the index j is used
to loop over all ports 0 .. N of the HCA's. It is clear that for HCA's
port 0 will be skipped. However, since the current code does not advance
the lid and linkstate accordingly the place for the port 0 will not be
kept empty for the port 0.

Current code:
for (j = 0; j <= ca.numports; j++) {
   if (ca.ports[j]) {
      *p_lid = ca.ports[j]->base_lid;
      *p_linkstates = ca.ports[j]->state;
      p_lid++;
      p_linkstates++;
   }
}
Should be:
for (j = 0; j <= ca.numports; j++) {
   if (ca.ports[j]) {
      *p_lid = ca.ports[j]->base_lid;
      *p_linkstates = ca.ports[j]->state;
   }
   /* as j advance even if the port is not valid, so should the
      lid and state pointer */
   p_lid++;
   p_linkstates++;
}

As I could not convince you with the above explanations in my previous
mail I have written the following simple program to test the pre-and
post patch effect:

/*
   test program for dumping osm_vendor_get_all_port_attr results
*/

#include "stdio.h"
#include <stdlib.h>
#include <unistd.h>
#include <vendor/osm_vendor_api.h>
#include <opensm/osm_opensm.h>

#include <common.h>
#define GUID_ARRAY_SIZE 64
int
main() {
   osm_vendor_t vendor;
   osm_log_t osm_log;
   ib_api_status_t status;
   uint32_t num_ports = GUID_ARRAY_SIZE;
   ib_port_attr_t attr_array[GUID_ARRAY_SIZE];
   int i;

   osm_log_construct(&osm_log);
   osm_log_init(&osm_log, TRUE, 0xff, "/tmp/test_vendor.log");

   osm_vendor_init(&vendor, &osm_log, 1000);

   status = osm_vendor_get_all_port_attr(&vendor, attr_array, &num_ports );
   if ( status != IB_SUCCESS )
   {
     printf( "\nError from osm_vendor_get_all_port_attr (%x)\n", status);
     return;
   }

   printf("\nListing GUIDs:\n");
   for (i = 0; i < num_ports; i++) {
     printf("Port %i:0x%"PRIx64" lid:0x%04x state:%x\n",
            i,
            cl_hton64(attr_array[i].port_guid),
            cl_ntoh16(attr_array[i].lid),
            attr_array[i].link_state
            );
   }

   exit(0);
}

Without the above change I get:
Listing GUIDs:
Port 0:0xd9dffffff3d55 lid:0x0300 state:4
Port 1:0xd9dffffff3d55 lid:0x0400 state:4
Port 2:0xd9dffffff3d56 lid:0x0000 state:0

After the simple change I get:
Listing GUIDs:
Port 0:0xd9dffffff3d55 lid:0x0300 state:4
Port 1:0xd9dffffff3d55 lid:0x0300 state:4
Port 2:0xd9dffffff3d56 lid:0x0400 state:4

So as you can see - without the fix the lid of port 2 is presented as 
the lid of port 1...

I guess you use ibstatus in your mail. Well ibstatus uses its own code
so it shows the correct info anyway. In my case that is:
swlab223:/tmp/bld/libvendor>ibstatus
Infiniband device 'mthca0' port 1 status:
         default gid:     fe80:0000:0000:0000:000d:9dff:ffff:3d55
         base lid:        0x3
         sm lid:          0x1
         state:           4: ACTIVE
         phys state:      5: LinkUp
         rate:            10 Gb/sec (4X)

Infiniband device 'mthca0' port 2 status:
         default gid:     fe80:0000:0000:0000:000d:9dff:ffff:3d56
         base lid:        0x4
         sm lid:          0x1
         state:           4: ACTIVE
         phys state:      5: LinkUp
         rate:            10 Gb/sec (4X)



Hal Rosenstock wrote:
> Hi Eitan,
> 
> On Sun, 2005-08-21 at 03:32, Eitan Zahavi wrote:
> 
>>osm_vendor_get_all_port_attr returns incorrect LID and state for 
>>device ports. This bug was caused by the fact that if a device port
>>was skipped due to that fact it does not exist (HCA port 0). The 
>>lid and state pointers used as indexes into their corresponding
>>return value arrays were not advancing to the next port index.
>>
>>So the return for a single HCA was mixing LID and state for the first
>>port and displayed non initialized memory for the second port.
> 
> 
> The array is not filled in as you claim. Port 0 does not take a slot on
> an HCA. This looks fine to me as is (I added some print statements in
> that loop as follows):
> 
> osm_vendor_get_all_port_attr: port 0
> osm_vendor_get_all_port_attr: port 1
> osm_vendor_get_all_port_attr: port 1 lid 1 state 4
> osm_vendor_get_all_port_attr: port 2
> osm_vendor_get_all_port_attr: port 2 lid 0 state 1
> 
> Port 0 is skipped; port 1 is LID 1 and active; port 2 is not plugged in
> and is down:
> 
>         Port 1:
>                 State: Active
>                 Physical state: LinkUp
>                 Rate: 2
>                 Base lid: 1
>                 LMC: 0
>                 SM lid: 1
>                 Capability mask: 0x00500a68
>                 Port GUID: 0x0008f10403960559
>         Port 2:
>                 State: Down
>                 Physical state: Polling
>                 Rate: 2
>                 Base lid: 0
>                 LMC: 0
>                 SM lid: 0
>                 Capability mask: 0x00500a68
>                 Port GUID: 0x0008f1040396055a
> 
> -- Hal




More information about the general mailing list