[openib-general] [PATCH] osm: partition manager force policy
Eitan Zahavi
eitan at mellanox.co.il
Tue Jun 13 07:21:24 PDT 2006
Hi Hal,
Hal Rosenstock wrote:
> Hi Eitan,
>
> On Tue, 2006-06-13 at 08:54, Eitan Zahavi wrote:
>
>>--text follows this line--
>>Hi Hal
>>
>>This is a second take after debug and cleanup of the partition manager
>>patch I have previously provided.
>
>
> Thanks.
>
> So this patch superceeds the previous version ? If so, in the future,
> just indicate [PATCHv2] for this.
>
>
>> The functionality is the same but
>>this one is after 2 days of testing on the simulator.
>
>
> Are you still working on this (more testing) ?
>
>
>>I also did some code restructuring for clarity.
>
>
>>Tests passed were both dedicated pkey enforcements (pkey.*) and
>>stress test (osmStress.*)
>>
>>As I started to test the partition manager code (using ibmgtsim pkey test),
>>I realized the implementation does not really enforces the partition policy
>>on the given fabric. This patch fixes that. It was verified using the
>>simulation test. Several other corner cases were fixed too.
>
>
> Can you elaborate on these cases ?
If you ask about the corner cases:
1. A bug in avoiding switch enforcement when the HCA had more blocks then the switch.
2. Similar but when the HCA blocks are unused so actually the switch does not need so many blocks
3. Segfaults due to fabric instability.
If you ask about the test code it is checked in https://openib.org/svn/gen2/utils/src/linux-user/ibmgtsim/tests
the file names start with pkey.* and osmStress.*.
In general the pkey test does:
* Randomize 3 pkeys p1 p2 p3 (first 2 are full 1 is partial)
* Assignment of ports into 3 groups G1 which uses p1, G2 which
uses p2 and G3 which uses p1,p2 and p3
* For each HCA port randomize pkey tables with random number of entries
(including the ones above with random location)
* For some ports override the tables with an incorrect set
* write a partition policy file
* start the SM, wait for subnet up
* randomly select HCA ports and verify (using osmtest -f c) that all-to-all path records they
see are limited by the partitions they belong to
* forcefully null all default pkey entries on the fabric ports
* set a change bit on a switch to force a sweep
* wait for subnet up and check all ports do have correct default pkey set
The stress test does:
* Setup LIDs
* Force some random LID violations (duplicated, misaligned, zero)
* Write guid2lid file with some random change
* Disconnect some random nodes
* Run OpenSM wait for subnet up
* Repeat 10 times: Reconnect all nodes Disconnect some random nodes
* Wait for subnet up
* check all LID values are correct (according to guid2lid)
* Start 240 iterations of selecting one of the following :
connect random port
disconnect random port
register random service
query random paths from random nodes
join random port to 0xC000
leave random port from 0xC000
* Eventually:
connect all nodes
join 0xC000 from all HCA ports
wait for subnet up
check connectivity and FDB validity etc using ibdiagnet
More information about the general
mailing list