***SPAM*** Re: [ofa-general] Any easy way to specify to the SM to route/zone?

Chris Worley worleys at gmail.com
Mon Apr 13 11:26:45 PDT 2009


On Mon, Apr 13, 2009 at 11:53 AM, Hal Rosenstock
<hal.rosenstock at gmail.com> wrote:
> On Mon, Apr 13, 2009 at 12:02 PM, Chris Worley <worleys at gmail.com> wrote:
>> On Mon, Apr 13, 2009 at 7:43 AM, Hal Rosenstock <hal.rosenstock at gmail.com> wrote:
>>> On Mon, Apr 13, 2009 at 9:37 AM, Chris Worley <worleys at gmail.com> wrote:
>>>> On Mon, Apr 13, 2009 at 5:39 AM, Hal Rosenstock
>>>> <hal.rosenstock at gmail.com> wrote:
>>>>> On Sun, Apr 12, 2009 at 11:01 PM, Chris Worley <worleys at gmail.com> wrote:
>>>>>>
>>>>>> So I need to tell the SM to route specific ports on the server/target
>>>>>> to specific clients/initiators.
>>>>>>
>>>>>> Is there any way to do this?
>>>>>
>>>>> Do you mean restrict access between certain clients/servers ?
>>>>
>>>> One server w/ 4QDR boards, 16 clients with one QDR board.  I want each
>>>> port on the server routed/zoned to two clients.
>>>>
>>>>> If so,
>>>>> you can do this with partitioning
>>>>
>>>> What is partitioning?
>>>
>>> A partition is a collection of ports which are allowed to communicate
>>> together. There are two forms of members: full members which can talk
>>> to any other member (useful for servers) and limited members which can
>>> only talk to full members (useful for clients). See the opensm man
>>> page or partition-config.txt on setting this up for OpenSM.
>>>
>>
>> Let me see if I understand this with a simple example... my port GUIDs
>> (as reported by ibstat) are for one server (4 QDR ports) and four
>> clients (one QDR port each):
>>
>>
>> Server A:           Port GUID: 0x0024717124000029
>> Server B:           Port GUID: 0x002471712400002a
>> Server C:           Port GUID: 0x0024717127000035
>> Server D:           Port GUID: 0x0024717127000036
>>
>> Client 1:                Port GUID: 0x0002c90300028c01
>> Client 2:                Port GUID: 0x0002c90300026047
>> Client 3:                Port GUID: 0x0002c90300026053
>> Client 4:                Port GUID: 0x0002c9030002603b
>>
>> Assuming I want a 1:1 (one server port to one client) partitioning, I
>> would put the following in /etc/ofed/partitions.conf:
>>
>> part1=0x1, ipoib, defmember=full : 0x0024717124000029, 0x0002c90300028c01;
>> part2=0x2, ipoib, defmember=full : 0x002471712400002a, 0x0002c90300026047;
>> part3=0x3, ipoib, defmember=full : 0x0024717127000035, 0x0002c90300026053;
>> part4=0x4, ipoib, defmember=full : 0x0024717127000036, 0x0002c9030002603b;
>
> So you want IPoIB.

I'm doing SRP, so I need IPoIB working.

>
>> ... and run w/:
>>
>> opensm -r -B -P/etc/ofed/partitions.conf
>>
>> Does that sound correct?  It doesn't work
>
> What application(s) aren't working ?

ping over IPoIB, for example.

I am seeing the test node in an "initializing" state right now... I
thought it was "up" before.

> Any SM error messages ?

The server has one klogd error coming out continuously:

ib0: multicast join failed for
ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22

OpenSM is seeing "lid out of range", "send completed with error",
"Failed to find source physical port for trap"
Opensm's log looks like:

Apr 13 12:03:43 556996 [21085350] 0x03 -> OpenSM 3.2.2
Apr 13 12:03:43 557061 [21085350] 0x80 -> OpenSM 3.2.2
Apr 13 12:03:43 557556 [21085350] 0x02 -> osm_vendor_init: 1000
pending umads specified
Apr 13 12:03:43 557659 [21085350] 0x80 -> Entering DISCOVERING state
Apr 13 12:03:43 605573 [21085350] 0x02 -> osm_vendor_bind: Binding to
port 0x24717124000029
Apr 13 12:03:43 636142 [21085350] 0x02 -> osm_vendor_bind: Binding to
port 0x24717124000029
Apr 13 12:03:44 437076 [4863C940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11 trans_id=0x520000123b)
-- dropping
Apr 13 12:03:44 437104 [4863C940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Apr 13 12:03:44 437126 [4863C940] 0x01 -> Received SMP on a 1 hop path:
                                Initial path = 0,0
                                Return path  = 0,0
Apr 13 12:03:44 437135 [4863C940] 0x01 ->
__osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error
(IB_TIMEOUT)
Apr 13 12:03:44 437179 [4863C940] 0x01 -> SMP dump:
                                base_ver................0x1
                                mgmt_class..............0x81
                                class_ver...............0x1
                                method..................0x1 (SubnGet)
                                D bit...................0x0
                                status..................0x0
                                hop_ptr.................0x0
                                hop_count...............0x1
                                trans_id................0x123b
                                attr_id.................0x11 (NodeInfo)
                                resv....................0x0
                                attr_mod................0x0
                                m_key...................0x0000000000000000
                                dr_slid.................65535
                                dr_dlid.................65535

                                Initial path: 0,1
                                Return path:  0,0
                                Reserved:     [0][0][0][0][0][0][0]

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

Apr 13 12:03:44 437218 [47C3B940] 0x80 -> Entering MASTER state
Apr 13 12:03:44 437409 [47C3B940] 0x02 -> osm_report_notice: Reporting
Generic Notice type:3 num:66 from LID:0
GID:0xfe80000000000000,0x0024717124000029
Apr 13 12:03:44 437458 [47C3B940] 0x02 -> osm_report_notice: Reporting
Generic Notice type:3 num:66 from LID:0
GID:0xfe80000000000000,0x0024717124000029
Apr 13 12:03:44 437514 [47C3B940] 0x02 -> osm_report_notice: Reporting
Generic Notice type:3 num:66 from LID:0
GID:0xfe80000000000000,0x0024717124000029
Apr 13 12:03:44 437558 [47C3B940] 0x02 -> osm_report_notice: Reporting
Generic Notice type:3 num:66 from LID:0
GID:0xfe80000000000000,0x0024717124000029
Apr 13 12:03:44 437612 [47C3B940] 0x02 -> osm_report_notice: Reporting
Generic Notice type:3 num:66 from LID:0
GID:0xfe80000000000000,0x0024717124000029
Apr 13 12:03:44 437653 [47C3B940] 0x02 -> osm_report_notice: Reporting
Generic Notice type:3 num:66 from LID:0
GID:0xfe80000000000000,0x0024717124000029
Apr 13 12:03:44 437707 [47C3B940] 0x02 -> osm_report_notice: Reporting
Generic Notice type:3 num:66 from LID:0
GID:0xfe80000000000000,0x0024717124000029
Apr 13 12:03:44 437748 [47C3B940] 0x02 -> osm_report_notice: Reporting
Generic Notice type:3 num:66 from LID:0
GID:0xfe80000000000000,0x0024717124000029
Apr 13 12:03:44 443077 [47C3B940] 0x80 -> SUBNET UP
Apr 13 12:03:44 891932 [42232940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:259 Producer:2 (Switch) from LID:10 TID:0x000000000000018f
Apr 13 12:03:44 891951 [42232940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:03:44 891959 [42232940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:03:45 184124 [44035940] 0x01 -> __osm_mcmr_rcv_join_mgrp:
ERR 1B11: method = SubnAdmSet, scope_state = 0x1, component mask =
0x0000000000010083, expected comp mask = 0x00000000000130c7, MGID:
0xff12401bffff0000 : 0x00000000ffffffff from port 0x0
24717124000029 (MT25408)

...

Apr 13 12:04:04 852289 [43634940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:259 Producer:2 (Switch) from LID:10 TID:0x000000000000018f
Apr 13 12:04:04 852306 [43634940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:04:04 852314 [43634940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:04:04 852363 [43634940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3804: Received trap 20 times
consecutively
Apr 13 12:04:05 850307 [44035940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:259 Producer:2 (Switch) from LID:10 TID:0x000000000000018f
Apr 13 12:04:05 850327 [44035940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:04:05 850334 [44035940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:04:06 848327 [44A36940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:259 Producer:2 (Switch) from LID:10 TID:0x000000000000018f
Apr 13 12:04:06 848340 [44A36940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:04:06 848348 [44A36940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:04:07 846349 [45437940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:259 Producer:2 (Switch) from LID:10 TID:0x000000000000018f
Apr 13 12:04:07 846365 [45437940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:04:07 846373 [45437940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:04:08 844372 [45E38940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:259 Producer:2 (Switch) from LID:10 TID:0x000000000000018f
Apr 13 12:04:08 844391 [45E38940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:04:08 844398 [45E38940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:04:09 842394 [46839940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:259 Producer:2 (Switch) from LID:10 TID:0x000000000000018f
Apr 13 12:04:09 842414 [46839940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:04:09 842421 [46839940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:04:10 840400 [42232940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:259 Producer:2 (Switch) from LID:10 TID:0x000000000000018f
Apr 13 12:04:10 840414 [42232940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:04:10 840421 [42232940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:04:11 838419 [42C33940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:259 Producer:2 (Switch) from LID:10 TID:0x000000000000018f
Apr 13 12:04:11 838432 [42C33940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:04:11 838440 [42C33940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:04:12 836435 [43634940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:259 Producer:2 (Switch) from LID:10 TID:0x000000000000018f
Apr 13 12:04:12 836467 [43634940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:04:12 836476 [43634940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:04:13 834459 [45437940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:259 Producer:2 (Switch) from LID:10 TID:0x000000000000018f
Apr 13 12:04:13 834479 [45437940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:04:13 834487 [45437940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:04:14 364185 [4863C940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11 trans_id=0x5200001266)
-- dropping
Apr 13 12:04:14 364211 [4863C940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0

...

Apr 13 12:19:51 971642 [453B6940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:259 Producer:2 (Switch) from LID:10 TID:0x000000000000018f
Apr 13 12:19:51 971658 [453B6940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:19:51 971666 [453B6940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:19:52 969658 [45DB7940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:128 Producer:2 (Switch) from LID:10 TID:0x0000000000000190
Apr 13 12:19:52 969671 [45DB7940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:19:52 969679 [45DB7940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:19:53 967681 [467B8940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:259 Producer:2 (Switch) from LID:10 TID:0x000000000000018f
Apr 13 12:19:53 967696 [467B8940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:19:53 967704 [467B8940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:19:54 965697 [471B9940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:128 Producer:2 (Switch) from LID:10 TID:0x0000000000000190
Apr 13 12:19:54 965710 [471B9940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:19:54 965717 [471B9940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:19:55 963717 [42BB2940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:259 Producer:2 (Switch) from LID:10 TID:0x000000000000018f
Apr 13 12:19:55 963735 [42BB2940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:19:55 963743 [42BB2940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:19:56 961736 [435B3940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:128 Producer:2 (Switch) from LID:10 TID:0x0000000000000190
Apr 13 12:19:56 961749 [435B3940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:19:56 961779 [435B3940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:19:57 959748 [43FB4940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:259 Producer:2 (Switch) from LID:10 TID:0x000000000000018f
Apr 13 12:19:57 959771 [43FB4940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:19:57 959779 [43FB4940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:19:58 957770 [449B5940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:128 Producer:2 (Switch) from LID:10 TID:0x0000000000000190
Apr 13 12:19:58 957788 [449B5940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:19:58 957795 [449B5940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:19:59 955793 [453B6940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:259 Producer:2 (Switch) from LID:10 TID:0x000000000000018f
Apr 13 12:19:59 955806 [453B6940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:19:59 955813 [453B6940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:20:00 491524 [45DB7940] 0x01 -> __osm_mcmr_rcv_join_mgrp:
ERR 1B11: method = SubnAdmSet, scope_state = 0x1, component mask =
0x0000000000010083, expected comp mask = 0x00000000000130c7, MGID:
0xff12401bffff0000 : 0x00000000ffffffff from port 0x0
24717124000029 (MT25408 IOSAN Fusion-IO)
Apr 13 12:20:00 953808 [42BB2940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:128 Producer:2 (Switch) from LID:10 TID:0x0000000000000190
Apr 13 12:20:00 953822 [42BB2940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:20:00 953830 [42BB2940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:20:01 424318 [48FBC940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11 trans_id=0x5500001311)
-- dropping
Apr 13 12:20:01 424345 [48FBC940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Apr 13 12:20:01 424356 [48FBC940] 0x01 -> Received SMP on a 1 hop path:
                                Initial path = 0,0
                                Return path  = 0,0
Apr 13 12:20:01 424366 [48FBC940] 0x01 ->
__osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error
(IB_TIMEOUT)
Apr 13 12:20:01 424410 [48FBC940] 0x01 -> SMP dump:
                                base_ver................0x1
                                mgmt_class..............0x81
                                class_ver...............0x1
                                method..................0x1 (SubnGet)
                                D bit...................0x0
                                status..................0x0
                                hop_ptr.................0x0
                                hop_count...............0x1
                                trans_id................0x1311
                                attr_id.................0x11 (NodeInfo)
                                resv....................0x0
                                attr_mod................0x0
                                m_key...................0x0000000000000000
                                dr_slid.................65535
                                dr_dlid.................65535

                                Initial path: 0,1
                                Return path:  0,0
                                Reserved:     [0][0][0][0][0][0][0]

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

> Any end
> node messages pertaining to IB ?

Nothing I can see.

>
>> (I restarted ib on the
>> clients), although ibstat shows the links up.  What am I getting
>> wrong?  The opensmd is running on the server.
>
> Which server ?

There's only one server... it has many ports for which I'm trying to
partition do different clients.  So, in the above, when I say "Server
A", I mean server port "A".

>
> You still need the default partition with the SM node being full and
> the others being limited there (so it's also best to run SM on
> separate node if possible otherwise you have the potential of any
> client connecting to it on default partition).

Are you saying to change the partitions.conf file to:

part1=0x1, ipoib: 0x0024717124000029=full, 0x0002c90300028c01;
part2=0x2, ipoib: 0x002471712400002a=full, 0x0002c90300026047;
part3=0x3, ipoib: 0x0024717127000035=full, 0x0002c90300026053;
part4=0x4, ipoib: 0x0024717127000036=full, 0x0002c9030002603b;

... (which still doesn't work) in which case I set all the server's
ports to "full", or should just one be "full" (which didn't work
either)?

I did have a difficult time understanding the difference between
"full" and "limited" in the man page.

I've got a captive network, so I don't want any paths I've not
specified to be allowed.  If that makes any sense.  So, I didn't want
to put a statement in like:

Default=0x7fff,ipoib:ALL=full;

... that would let a rogue node slip through the cracks.

Thanks,

Chris
>
> -- Hal
>
>> Thanks,
>>
>> Chris
>>
>



More information about the general mailing list