[openib-general] [PATCH] osm: partition manager force policy

Eitan Zahavi eitan at mellanox.co.il
Tue Jun 13 07:21:24 PDT 2006


Hi Hal,

Hal Rosenstock wrote:
> Hi Eitan,
> 
> On Tue, 2006-06-13 at 08:54, Eitan Zahavi wrote:
> 
>>--text follows this line--
>>Hi Hal
>>
>>This is a second take after debug and cleanup of the partition manager
>>patch I have previously provided.
> 
> 
> Thanks.
> 
> So this patch superceeds the previous version ? If so, in the future,
> just indicate [PATCHv2] for this.
> 
> 
>> The functionality is the same but
>>this one is after 2 days of testing on the simulator.
> 
> 
> Are you still working on this (more testing) ?
> 
> 
>>I also did some code restructuring for clarity. 
> 
> 
>>Tests passed were both dedicated pkey enforcements (pkey.*) and
>>stress test (osmStress.*)
>>
>>As I started to test the partition manager code (using ibmgtsim pkey test),
>>I realized the implementation does not really enforces the partition policy
>>on the given fabric. This patch fixes that. It was verified using the 
>>simulation test. Several other corner cases were fixed too.
> 
> 
> Can you elaborate on these cases ?
If you ask about the corner cases:
1. A bug in avoiding switch enforcement when the HCA had more blocks then the switch.
2. Similar but when the HCA blocks are unused so actually the switch does not need so many blocks
3. Segfaults due to fabric instability.

If you ask about the test code it is checked in https://openib.org/svn/gen2/utils/src/linux-user/ibmgtsim/tests
the file names start with pkey.* and osmStress.*.

In general the pkey test does:
* Randomize 3 pkeys p1 p2 p3 (first 2 are full 1 is partial)
* Assignment of ports into 3 groups G1 which uses p1, G2 which
   uses p2 and G3 which uses p1,p2 and p3
* For each HCA port randomize pkey tables with random number of entries
   (including the ones above with random location)
* For some ports override the tables with an incorrect set
* write a partition policy file
* start the SM, wait for subnet up
* randomly select HCA ports and verify (using osmtest -f c) that all-to-all path records they
   see are limited by the partitions they belong to
* forcefully null all default pkey entries on  the fabric ports
* set a change bit on a switch to force a sweep
* wait for subnet up and check all ports do have correct default pkey set

The stress test does:
* Setup LIDs
* Force some random LID violations (duplicated, misaligned, zero)
* Write guid2lid file with some random change
* Disconnect some random nodes
* Run OpenSM wait for subnet up
* Repeat 10 times: Reconnect all nodes Disconnect some random nodes
* Wait for subnet up
* check all LID values are correct (according to guid2lid)
* Start 240 iterations of selecting one of the following :
   connect random port
   disconnect random port
   register random service
   query random paths from random nodes
   join random port to 0xC000
   leave random port from 0xC000
*  Eventually:
    connect all nodes
    join 0xC000 from all HCA ports
    wait for subnet up
    check connectivity and FDB validity etc using ibdiagnet




More information about the general mailing list