***SPAM*** Re: [ofa-general] OpenSM and trap 128.

Hal Rosenstock hal.rosenstock at gmail.com
Thu Mar 26 08:28:07 PDT 2009


On Thu, Mar 26, 2009 at 11:20 AM, Nicolas Morey Chaisemartin
<nicolas.morey-chaisemartin at ext.bull.net> wrote:
> Hal Rosenstock wrote:
>>
>>> Fixing the cable will solve our problem, but I still think something should be done about this.
>>>
>>> Though OpenSM behaviour was OK, it was really difficult to find where the performances problems came from.
>>
>> There should be some log messages as to the trap rate being exceeded.
>> Were they not present ? Which OpenSM version ?
>
> Only message we had are the events on trap reception (so a real lots of them).
> However we didn't check that before spending quite some time trying to understand where performances loss could come from.
> OpenSM is git head + Bull_patches on the top.

So babbling_port_policy should be there.

>>
>>> All our diagnostics tools (mostly using infiniband diags) were failing to see the problem.
>>> Infiniband diags commands fail toward the faulty port but it was hard to say if port was faulty or if it was due to high load on the SM and dropped VL15 messages.
>>
>> Yes, the only thing you would observe is VL15 drops via perfquery. The
>> SM is the one which should be logging the trap originator which is the
>> way to diagnose this issue.
>>
>
> It is actually. Though it's missing the port number in the log message.

The trap itself does not contain the port number so the log message
can't contain it. The reason is simplicity as multiple ports may have
this condition.

-- Hal

> Nicolas
>



More information about the general mailing list