[ofw] Doing queries on subnet every 30 seconds

Hal Rosenstock hal.rosenstock at gmail.com
Tue May 3 11:49:25 PDT 2011


On Tue, May 3, 2011 at 2:29 PM, Fab Tillier <ftillier at microsoft.com> wrote:
> Hal Rosenstock wrote on Tue, 3 May 2011 at 11:20:01
>
>> On Tue, May 3, 2011 at 2:15 PM, Fab Tillier <ftillier at microsoft.com> wrote:
>>> Hal Rosenstock wrote on Mon, 2 May 2011 at 05:09:46
>>>
>>>> On Mon, May 2, 2011 at 4:54 AM, Tzachi Dar <tzachid at mellanox.co.il>
>>>> wrote:
>>>>> It seems that currently we have code that scans for srp targets every 30
>>>>> seconds (and creates qpr).
>>>>>
>>>>> See the function __ioc_query_sa for more details.
>>>>>
>>>>
>>>> What is __ioc_query_sa doing ? Where is this code found ? Is this an
>>>> SA query ? Which one ?
>>>
>>> Basically, if I recall correctly, it queries the SM for all port records and all
>>> path records, does a join of the tables to get all paths to ports that have the
>>> IOC bit set, and then sends DM MADs to discover I/O devices.
>>
>> There is a much better optional algorithm for this with more recent SMs.
>
> What is wanted is a query for all paths to ports with the DM bit set.

Exactly.

> What's this better algorithm?

At some SC quite a while ago, we solved this but it had to be an
option because it came after the fact and all SMs might not support it
although at this point in time I suspect most somewhat recent SMs
support this feature.

It involves checking that the SM/SA supports
IsPortInfoCapMaskMatchSupported (via querying SA ClassPortInfo and
checking CapabilityMask bit 13) and if so, making the advanced
PortInfoRecord queries to get _only_ those ports with the DM PortInfo
CapMask bit set instead of having to get all ports and sift through
them at each end node.

Also, rather than polling every <n (30 ?> seconds, the node could
register for trap 144 events on all LIDs in the subnet and only
requery then. Unfortunately, it's not more granular than that (more
than just CapMask changes in trap 144 - I can list them all if you
want) but this is still way better than periodic polling IMO. You
would also then need to reregister this event subscription when the SM
asked you to (client reregistration).

-- Hal

> Another thing we should change is that IBAL sets the DM bit by default on all nodes, and that should be disabled too.  I would suggest we drop support for device management from IBAL - it can be implemented above it if someone needs it.

> -Fab
>
>>
>> -- Hal
>>
>>> -Fab
>>>
>>>>
>>>> -- Hal
>>>>
>>>>>
>>>>> When the cluster gets big it creates load on the sm.
>>>>>
>>>>>
>>>>>
>>>>> Is there any objection to disable it by default?
>>>>>
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>> Tzachi
>>>>>
>>>>> _______________________________________________
>>>>> ofw mailing list
>>>>> ofw at lists.openfabrics.org
>>>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
>>>>>
>>>> _______________________________________________
>>>> ofw mailing list
>>>> ofw at lists.openfabrics.org
>>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
>>>
>



More information about the ofw mailing list