***SPAM*** Re: [ofa-general] Any easy way to specify to the SM to route/zone?

Chris Worley worleys at gmail.com
Mon Apr 13 16:39:36 PDT 2009


On Mon, Apr 13, 2009 at 4:24 PM, Hal Rosenstock
<hal.rosenstock at gmail.com> wrote:
> On Mon, Apr 13, 2009 at 5:50 PM, Chris Worley <worleys at gmail.com> wrote:
>
> <snip...>
>
>>> Were the ports getting to LinkUp/Active before partitions were configured ?
>>
>> Yes, before I started trying to partition, all the nodes could
>> communicate... except they'd all use just one port on the server and I
>> couldn't get the throughput I needed.
>
> I suspect the switch SMA went south sometime after this.

I'm now power-cycling the switch for each partition change.

<snip>
>>>> Partition "part2" with P_Key=2 should connect this client's port 0 to
>>>> the sever on port 1 of mlx4_1
>>>
>>> Do you really mean port 0 ?
>>
>> Nope... in this case I have 0x0002c903000292b0 in part2 in my
>> partitions file, which is port 1, the second port of the adapter.  I'm
>> hoping to use both ports of all adapters on the server.
>
> So you're talking about physical marking on the card rather than
> actual (logical) port number.

I'm not sure about board markings... both ports are attached to the
switch, for all IB adapters, so all should work.  I'm using the
numbers provided by ibstat.

>> So, on one client... the one corresponding to "part2" in the
>> partitions file, I put the P_Key into the "create child":
>>
>> echo 0x2 > /sys/class/net/ib0/create_child
>>
>> ... and did likewise on the host, for ib3 (the second port on the
>> second adapter):
>>
>> echo 0x2 > /sys/class/net/ib3/create_child
>
> I'm not 100% sure but I think you may need the full member PKey on at
> least one of them (0x800x).

I've changed the P_Keys to 0x800x, and set the "create_child" files
appropriately.

>
>> Still, no ping (the interfaces are setup correctly).
>
> Are there still join failure messages on the client and/or server ?
> What do they say now ?

Lot's of "bad P_Key" notices:

Apr 13 17:32:56 649698 [F59A9A30] 0x03 -> OpenSM 3.2.5_20081207
Apr 13 17:32:56 649737 [F59A9A30] 0x80 -> OpenSM 3.2.5_20081207
Apr 13 17:32:56 650078 [F59A9A30] 0x02 -> osm_vendor_init: 1000
pending umads specified
Apr 13 17:32:56 650201 [F59A9A30] 0x80 -> Entering DISCOVERING state
Apr 13 17:32:56 660286 [F59A9A30] 0x02 -> osm_vendor_bind: Binding to
port 0x2c90300026053
Apr 13 17:32:56 684519 [F59A9A30] 0x02 -> osm_vendor_bind: Binding to
port 0x2c90300026053
Apr 13 17:32:56 703826 [470BE940] 0x80 -> Entering MASTER state
Apr 13 17:32:56 704953 [470BE940] 0x02 -> osm_ucast_mgr_process:
minhop tables configured on all switches
Apr 13 17:32:56 713917 [470BE940] 0x80 -> SUBNET UP
Apr 13 17:32:57 112574 [452BB940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:2 num:257
(Bad P_Key) Producer:1 (Channel Adapter) from LID:1
TID:0x0000000000000741
Apr 13 17:32:57 112642 [452BB940] 0x02 -> osm_report_notice: Reporting
Generic Notice type:2 num:257 (Bad P_Key) from LID:1
GID:fe80::2:c903:2:6053
Apr 13 17:32:57 282788 [416B5940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:2 num:259
(Bad P_Key (switch external port)) Producer:2 (Switch) from LID:11
TID:0x000000000000018e
Apr 13 17:32:57 282817 [416B5940] 0x02 -> osm_report_notice: Reporting
Generic Notice type:2 num:259 (Bad P_Key (switch external port)) from
LID:11 GID:fe80::2:c902:40:46f8
Apr 13 17:32:58 280801 [42AB7940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:2 num:259
(Bad P_Key (switch external port)) Producer:2 (Switch) from LID:11
TID:0x000000000000018f
Apr 13 17:32:58 280828 [42AB7940] 0x02 -> osm_report_notice: Reporting
Generic Notice type:2 num:259 (Bad P_Key (switch external port)) from
LID:11 GID:fe80::2:c902:40:46f8
Apr 13 17:32:58 761835 [434B8940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:2 num:257
(Bad P_Key) Producer:1 (Channel Adapter) from LID:1
TID:0x0000000000000742
Apr 13 17:32:58 761858 [434B8940] 0x02 -> osm_report_notice: Reporting
Generic Notice type:2 num:257 (Bad P_Key) from LID:1
GID:fe80::2:c903:2:6053
Apr 13 17:32:59 278816 [452BB940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:2 num:259
(Bad P_Key (switch external port)) Producer:2 (Switch) from LID:11
TID:0x0000000000000190
Apr 13 17:32:59 278835 [452BB940] 0x02 -> osm_report_notice: Reporting
Generic Notice type:2 num:259 (Bad P_Key (switch external port)) from
LID:11 GID:fe80::2:c902:40:46f8
Apr 13 17:33:00 276841 [416B5940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:2 num:259
(Bad P_Key (switch external port)) Producer:2 (Switch) from LID:11
TID:0x0000000000000191
Apr 13 17:33:00 276862 [416B5940] 0x02 -> osm_report_notice: Reporting
Generic Notice type:2 num:259 (Bad P_Key (switch external port)) from
LID:11 GID:fe80::2:c902:40:46f8
Apr 13 17:33:03 459759 [42AB7940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:2 num:257
(Bad P_Key) Producer:1 (Channel Adapter) from LID:1
TID:0x0000000000000743
Apr 13 17:33:03 459785 [42AB7940] 0x02 -> osm_report_notice: Reporting
Generic Notice type:2 num:257 (Bad P_Key) from LID:1
GID:fe80::2:c903:2:6053
Apr 13 17:33:04 268908 [434B8940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:2 num:259
(Bad P_Key (switch external port)) Producer:2 (Switch) from LID:11
TID:0x0000000000000192
Apr 13 17:33:04 268927 [434B8940] 0x02 -> osm_report_notice: Reporting
Generic Notice type:2 num:259 (Bad P_Key (switch external port)) from
LID:11 GID:fe80::2:c902:40:46f8
Apr 13 17:33:05 266929 [452BB940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:2 num:259
(Bad P_Key (switch external port)) Producer:2 (Switch) from LID:11
TID:0x0000000000000193
Apr 13 17:33:05 266950 [452BB940] 0x02 -> osm_report_notice: Reporting
Generic Notice type:2 num:259 (Bad P_Key (switch external port)) from
LID:11 GID:fe80::2:c902:40:46f8
Apr 13 17:33:10 456664 [420B6940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:2 num:257
(Bad P_Key) Producer:1 (Channel Adapter) from LID:1
TID:0x0000000000000744
Apr 13 17:33:10 456690 [420B6940] 0x02 -> osm_report_notice: Reporting
Generic Notice type:2 num:257 (Bad P_Key) from LID:1
GID:fe80::2:c903:2:6053
Apr 13 17:33:11 255037 [43EB9940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:2 num:259
(Bad P_Key (switch external port)) Producer:2 (Switch) from LID:11
TID:0x0000000000000194
Apr 13 17:33:11 255083 [43EB9940] 0x02 -> osm_report_notice: Reporting
Generic Notice type:2 num:259 (Bad P_Key (switch external port)) from
LID:11 GID:fe80::2:c902:40:46f8
Apr 13 17:33:12 253054 [45CBC940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:2 num:259
(Bad P_Key (switch external port)) Producer:2 (Switch) from LID:11
TID:0x0000000000000195

Chris



More information about the general mailing list