[Users] odd subset manager behavior

Narayan Desai narayan.desai at gmail.com
Mon May 13 12:38:04 PDT 2013


OK, thanks. I just turned this up. We'll see what shows up.
thanks.
 -nld

On Mon, May 13, 2013 at 2:34 PM, Weiny, Ira <ira.weiny at intel.com> wrote:
>> -----Original Message-----
>> From: users-bounces at lists.openfabrics.org [mailto:users-
>> Subject: Re: [Users] odd subset manager behavior
>>
>> Here are the full logs from that execution of the subnet manager. It looks like
>> it was functioning properly, intermittently, right? (I'm assuming since the
>> intervals are irregular, and the fabric was continuing to function, it seems like
>> the SM was probably more or less
>> working)
>>
>> Ira, what level of debug should I be setting? I'm running opensm with -D
>> 0x40.
>
> I would add ERROR and INFO at least.
>
> #define OSM_LOG_ERROR   0x01
> #define OSM_LOG_INFO    0x02
>
> Ira
>
>> thanks
>>  -nld
>>
>>
>> May 10 14:14:19 856909 [512F9700] 0x80 -> OpenSM 3.3.15 May 10 14:14:19
>> 872752 [512F9700] 0x80 -> Entering DISCOVERING state May 10 14:14:20
>> 844905 [4B0C9700] 0x80 -> Entering MASTER state May 10 14:14:21 604958
>> [4B0C9700] 0x80 -> SUBNET UP May 10 18:19:41 103790 [4B0C9700] 0x80 ->
>> Errors during initialization May 10 23:19:18 147739 [4B0C9700] 0x80 -> Errors
>> during initialization May 11 00:09:29 119760 [4B0C9700] 0x80 -> Errors during
>> initialization May 11 01:39:44 171770 [4B0C9700] 0x80 -> Errors during
>> initialization May 11 01:40:35 187802 [4B0C9700] 0x80 -> Errors during
>> initialization May 11 04:20:43 199806 [4B0C9700] 0x80 -> Errors during
>> initialization May 11 06:17:38 399745 [4B0C9700] 0x80 -> Errors during
>> initialization May 11 07:20:36 183738 [4B0C9700] 0x80 -> Errors during
>> initialization May 11 07:42:16 207784 [4B0C9700] 0x80 -> Errors during
>> initialization May 11 08:07:02 199763 [4B0C9700] 0x80 -> Errors during
>> initialization May 11 12:00:18 243758 [4B0C9700] 0x80 -> Errors during
>> initialization May 11 14:05:18 295779 [4B0C9700] 0x80 -> Errors during
>> initialization May 11 15:38:14 299780 [4B0C9700] 0x80 -> Errors during
>> initialization May 11 16:03:30 503760 [4B0C9700] 0x80 -> Errors during
>> initialization May 11 16:16:01 295739 [4B0C9700] 0x80 -> Errors during
>> initialization May 11 17:37:38 303782 [4B0C9700] 0x80 -> Errors during
>> initialization May 11 17:38:42 327748 [4B0C9700] 0x80 -> Errors during
>> initialization May 11 22:01:49 339753 [4B0C9700] 0x80 -> Errors during
>> initialization May 11 22:02:19 379695 [4B0C9700] 0x80 -> Errors during
>> initialization May 12 03:04:48 403771 [4B0C9700] 0x80 -> Errors during
>> initialization May 12 03:31:22 403752 [4B0C9700] 0x80 -> Errors during
>> initialization May 12 03:44:41 415740 [4B0C9700] 0x80 -> Errors during
>> initialization May 12 10:35:59 475849 [4B0C9700] 0x80 -> Errors during
>> initialization May 12 10:41:39 467814 [4B0C9700] 0x80 -> Errors during
>> initialization May 12 11:27:02 471770 [4B0C9700] 0x80 -> Errors during
>> initialization May 12 13:34:57 467829 [4B0C9700] 0x80 -> Errors during
>> initialization May 12 14:33:43 487784 [4B0C9700] 0x80 -> Errors during
>> initialization May 12 14:35:56 507728 [4B0C9700] 0x80 -> Errors during
>> initialization May 12 14:39:56 499739 [4B0C9700] 0x80 -> Errors during
>> initialization May 12 15:37:38 687764 [4B0C9700] 0x80 -> Errors during
>> initialization May 12 18:06:41 531744 [4B0C9700] 0x80 -> Errors during
>> initialization May 12 18:32:50 551773 [4B0C9700] 0x80 -> Errors during
>> initialization May 12 18:54:21 511818 [4B0C9700] 0x80 -> Errors during
>> initialization May 12 19:16:14 527799 [4B0C9700] 0x80 -> Errors during
>> initialization May 13 01:34:58 583765 [4B0C9700] 0x80 -> Errors during
>> initialization May 13 02:25:13 615760 [4B0C9700] 0x80 -> Errors during
>> initialization May 13 05:16:22 611734 [4B0C9700] 0x80 -> Errors during
>> initialization May 13 05:22:09 603862 [4B0C9700] 0x80 -> Errors during
>> initialization May 13 05:56:45 851842 [4B0C9700] 0x80 -> Errors during
>> initialization May 13 06:15:47 851775 [4B0C9700] 0x80 -> Errors during
>> initialization May 13 13:30:24 290706 [512F9700] 0x80 -> Exiting SM
>>
>> On Mon, May 13, 2013 at 2:02 PM, Hal Rosenstock
>> <hal.rosenstock at gmail.com> wrote:
>> >
>> >
>> > On Mon, May 13, 2013 at 2:50 PM, Narayan Desai
>> > <narayan.desai at gmail.com>
>> > wrote:
>> >>
>> >> Our subnet manager started producing some weird error messages:
>> >> May 13 05:16:22 611734 [4B0C9700] 0x80 -> Errors during
>> >> initialization May 13 05:22:09 603862 [4B0C9700] 0x80 -> Errors
>> >> during initialization May 13 05:56:45 851842 [4B0C9700] 0x80 ->
>> >> Errors during initialization May 13 06:15:47 851775 [4B0C9700] 0x80
>> >> -> Errors during initialization
>> >>
>> >> The subnet manager was actually up and running prior to this, and had
>> >> successfully configured the network:
>> >> May 10 14:14:19 856909 [512F9700] 0x80 -> OpenSM 3.3.15 May 10
>> >> 14:14:19 872752 [512F9700] 0x80 -> Entering DISCOVERING state May 10
>> >> 14:14:20 844905 [4B0C9700] 0x80 -> Entering MASTER state May 10
>> >> 14:14:21 604958 [4B0C9700] 0x80 -> SUBNET UP May 10 18:19:41 103790
>> >> [4B0C9700] 0x80 -> Errors during initialization May 10 23:19:18
>> >> 147739 [4B0C9700] 0x80 -> Errors during initialization May 11
>> >> 00:09:29 119760 [4B0C9700] 0x80 -> Errors during initialization
>> >>
>> >> Any clue what is causing this?
>> >
>> >
>> > It means some critical set to configure the subnet failed.
>> >
>> > Are there error messages are in the opensm log ?
>> >
>> > -- Hal
>> >
>> >>
>> >> thanks.
>> >>  -nld
>> >> _______________________________________________
>> >> Users mailing list
>> >> Users at lists.openfabrics.org
>> >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users
>> >
>> >
>> _______________________________________________
>> Users mailing list
>> Users at lists.openfabrics.org
>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users



More information about the Users mailing list