***SPAM*** Re: [ofa-general] Any easy way to specify to the SM to route/zone?
Chris Worley
worleys at gmail.com
Mon Apr 13 11:26:45 PDT 2009
On Mon, Apr 13, 2009 at 11:53 AM, Hal Rosenstock
<hal.rosenstock at gmail.com> wrote:
> On Mon, Apr 13, 2009 at 12:02 PM, Chris Worley <worleys at gmail.com> wrote:
>> On Mon, Apr 13, 2009 at 7:43 AM, Hal Rosenstock <hal.rosenstock at gmail.com> wrote:
>>> On Mon, Apr 13, 2009 at 9:37 AM, Chris Worley <worleys at gmail.com> wrote:
>>>> On Mon, Apr 13, 2009 at 5:39 AM, Hal Rosenstock
>>>> <hal.rosenstock at gmail.com> wrote:
>>>>> On Sun, Apr 12, 2009 at 11:01 PM, Chris Worley <worleys at gmail.com> wrote:
>>>>>>
>>>>>> So I need to tell the SM to route specific ports on the server/target
>>>>>> to specific clients/initiators.
>>>>>>
>>>>>> Is there any way to do this?
>>>>>
>>>>> Do you mean restrict access between certain clients/servers ?
>>>>
>>>> One server w/ 4QDR boards, 16 clients with one QDR board. I want each
>>>> port on the server routed/zoned to two clients.
>>>>
>>>>> If so,
>>>>> you can do this with partitioning
>>>>
>>>> What is partitioning?
>>>
>>> A partition is a collection of ports which are allowed to communicate
>>> together. There are two forms of members: full members which can talk
>>> to any other member (useful for servers) and limited members which can
>>> only talk to full members (useful for clients). See the opensm man
>>> page or partition-config.txt on setting this up for OpenSM.
>>>
>>
>> Let me see if I understand this with a simple example... my port GUIDs
>> (as reported by ibstat) are for one server (4 QDR ports) and four
>> clients (one QDR port each):
>>
>>
>> Server A: Port GUID: 0x0024717124000029
>> Server B: Port GUID: 0x002471712400002a
>> Server C: Port GUID: 0x0024717127000035
>> Server D: Port GUID: 0x0024717127000036
>>
>> Client 1: Port GUID: 0x0002c90300028c01
>> Client 2: Port GUID: 0x0002c90300026047
>> Client 3: Port GUID: 0x0002c90300026053
>> Client 4: Port GUID: 0x0002c9030002603b
>>
>> Assuming I want a 1:1 (one server port to one client) partitioning, I
>> would put the following in /etc/ofed/partitions.conf:
>>
>> part1=0x1, ipoib, defmember=full : 0x0024717124000029, 0x0002c90300028c01;
>> part2=0x2, ipoib, defmember=full : 0x002471712400002a, 0x0002c90300026047;
>> part3=0x3, ipoib, defmember=full : 0x0024717127000035, 0x0002c90300026053;
>> part4=0x4, ipoib, defmember=full : 0x0024717127000036, 0x0002c9030002603b;
>
> So you want IPoIB.
I'm doing SRP, so I need IPoIB working.
>
>> ... and run w/:
>>
>> opensm -r -B -P/etc/ofed/partitions.conf
>>
>> Does that sound correct? It doesn't work
>
> What application(s) aren't working ?
ping over IPoIB, for example.
I am seeing the test node in an "initializing" state right now... I
thought it was "up" before.
> Any SM error messages ?
The server has one klogd error coming out continuously:
ib0: multicast join failed for
ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
OpenSM is seeing "lid out of range", "send completed with error",
"Failed to find source physical port for trap"
Opensm's log looks like:
Apr 13 12:03:43 556996 [21085350] 0x03 -> OpenSM 3.2.2
Apr 13 12:03:43 557061 [21085350] 0x80 -> OpenSM 3.2.2
Apr 13 12:03:43 557556 [21085350] 0x02 -> osm_vendor_init: 1000
pending umads specified
Apr 13 12:03:43 557659 [21085350] 0x80 -> Entering DISCOVERING state
Apr 13 12:03:43 605573 [21085350] 0x02 -> osm_vendor_bind: Binding to
port 0x24717124000029
Apr 13 12:03:43 636142 [21085350] 0x02 -> osm_vendor_bind: Binding to
port 0x24717124000029
Apr 13 12:03:44 437076 [4863C940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11 trans_id=0x520000123b)
-- dropping
Apr 13 12:03:44 437104 [4863C940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Apr 13 12:03:44 437126 [4863C940] 0x01 -> Received SMP on a 1 hop path:
Initial path = 0,0
Return path = 0,0
Apr 13 12:03:44 437135 [4863C940] 0x01 ->
__osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error
(IB_TIMEOUT)
Apr 13 12:03:44 437179 [4863C940] 0x01 -> SMP dump:
base_ver................0x1
mgmt_class..............0x81
class_ver...............0x1
method..................0x1 (SubnGet)
D bit...................0x0
status..................0x0
hop_ptr.................0x0
hop_count...............0x1
trans_id................0x123b
attr_id.................0x11 (NodeInfo)
resv....................0x0
attr_mod................0x0
m_key...................0x0000000000000000
dr_slid.................65535
dr_dlid.................65535
Initial path: 0,1
Return path: 0,0
Reserved: [0][0][0][0][0][0][0]
00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
Apr 13 12:03:44 437218 [47C3B940] 0x80 -> Entering MASTER state
Apr 13 12:03:44 437409 [47C3B940] 0x02 -> osm_report_notice: Reporting
Generic Notice type:3 num:66 from LID:0
GID:0xfe80000000000000,0x0024717124000029
Apr 13 12:03:44 437458 [47C3B940] 0x02 -> osm_report_notice: Reporting
Generic Notice type:3 num:66 from LID:0
GID:0xfe80000000000000,0x0024717124000029
Apr 13 12:03:44 437514 [47C3B940] 0x02 -> osm_report_notice: Reporting
Generic Notice type:3 num:66 from LID:0
GID:0xfe80000000000000,0x0024717124000029
Apr 13 12:03:44 437558 [47C3B940] 0x02 -> osm_report_notice: Reporting
Generic Notice type:3 num:66 from LID:0
GID:0xfe80000000000000,0x0024717124000029
Apr 13 12:03:44 437612 [47C3B940] 0x02 -> osm_report_notice: Reporting
Generic Notice type:3 num:66 from LID:0
GID:0xfe80000000000000,0x0024717124000029
Apr 13 12:03:44 437653 [47C3B940] 0x02 -> osm_report_notice: Reporting
Generic Notice type:3 num:66 from LID:0
GID:0xfe80000000000000,0x0024717124000029
Apr 13 12:03:44 437707 [47C3B940] 0x02 -> osm_report_notice: Reporting
Generic Notice type:3 num:66 from LID:0
GID:0xfe80000000000000,0x0024717124000029
Apr 13 12:03:44 437748 [47C3B940] 0x02 -> osm_report_notice: Reporting
Generic Notice type:3 num:66 from LID:0
GID:0xfe80000000000000,0x0024717124000029
Apr 13 12:03:44 443077 [47C3B940] 0x80 -> SUBNET UP
Apr 13 12:03:44 891932 [42232940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:259 Producer:2 (Switch) from LID:10 TID:0x000000000000018f
Apr 13 12:03:44 891951 [42232940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:03:44 891959 [42232940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:03:45 184124 [44035940] 0x01 -> __osm_mcmr_rcv_join_mgrp:
ERR 1B11: method = SubnAdmSet, scope_state = 0x1, component mask =
0x0000000000010083, expected comp mask = 0x00000000000130c7, MGID:
0xff12401bffff0000 : 0x00000000ffffffff from port 0x0
24717124000029 (MT25408)
...
Apr 13 12:04:04 852289 [43634940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:259 Producer:2 (Switch) from LID:10 TID:0x000000000000018f
Apr 13 12:04:04 852306 [43634940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:04:04 852314 [43634940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:04:04 852363 [43634940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3804: Received trap 20 times
consecutively
Apr 13 12:04:05 850307 [44035940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:259 Producer:2 (Switch) from LID:10 TID:0x000000000000018f
Apr 13 12:04:05 850327 [44035940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:04:05 850334 [44035940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:04:06 848327 [44A36940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:259 Producer:2 (Switch) from LID:10 TID:0x000000000000018f
Apr 13 12:04:06 848340 [44A36940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:04:06 848348 [44A36940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:04:07 846349 [45437940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:259 Producer:2 (Switch) from LID:10 TID:0x000000000000018f
Apr 13 12:04:07 846365 [45437940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:04:07 846373 [45437940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:04:08 844372 [45E38940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:259 Producer:2 (Switch) from LID:10 TID:0x000000000000018f
Apr 13 12:04:08 844391 [45E38940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:04:08 844398 [45E38940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:04:09 842394 [46839940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:259 Producer:2 (Switch) from LID:10 TID:0x000000000000018f
Apr 13 12:04:09 842414 [46839940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:04:09 842421 [46839940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:04:10 840400 [42232940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:259 Producer:2 (Switch) from LID:10 TID:0x000000000000018f
Apr 13 12:04:10 840414 [42232940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:04:10 840421 [42232940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:04:11 838419 [42C33940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:259 Producer:2 (Switch) from LID:10 TID:0x000000000000018f
Apr 13 12:04:11 838432 [42C33940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:04:11 838440 [42C33940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:04:12 836435 [43634940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:259 Producer:2 (Switch) from LID:10 TID:0x000000000000018f
Apr 13 12:04:12 836467 [43634940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:04:12 836476 [43634940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:04:13 834459 [45437940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:259 Producer:2 (Switch) from LID:10 TID:0x000000000000018f
Apr 13 12:04:13 834479 [45437940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:04:13 834487 [45437940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:04:14 364185 [4863C940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11 trans_id=0x5200001266)
-- dropping
Apr 13 12:04:14 364211 [4863C940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
...
Apr 13 12:19:51 971642 [453B6940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:259 Producer:2 (Switch) from LID:10 TID:0x000000000000018f
Apr 13 12:19:51 971658 [453B6940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:19:51 971666 [453B6940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:19:52 969658 [45DB7940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:128 Producer:2 (Switch) from LID:10 TID:0x0000000000000190
Apr 13 12:19:52 969671 [45DB7940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:19:52 969679 [45DB7940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:19:53 967681 [467B8940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:259 Producer:2 (Switch) from LID:10 TID:0x000000000000018f
Apr 13 12:19:53 967696 [467B8940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:19:53 967704 [467B8940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:19:54 965697 [471B9940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:128 Producer:2 (Switch) from LID:10 TID:0x0000000000000190
Apr 13 12:19:54 965710 [471B9940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:19:54 965717 [471B9940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:19:55 963717 [42BB2940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:259 Producer:2 (Switch) from LID:10 TID:0x000000000000018f
Apr 13 12:19:55 963735 [42BB2940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:19:55 963743 [42BB2940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:19:56 961736 [435B3940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:128 Producer:2 (Switch) from LID:10 TID:0x0000000000000190
Apr 13 12:19:56 961749 [435B3940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:19:56 961779 [435B3940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:19:57 959748 [43FB4940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:259 Producer:2 (Switch) from LID:10 TID:0x000000000000018f
Apr 13 12:19:57 959771 [43FB4940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:19:57 959779 [43FB4940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:19:58 957770 [449B5940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:128 Producer:2 (Switch) from LID:10 TID:0x0000000000000190
Apr 13 12:19:58 957788 [449B5940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:19:58 957795 [449B5940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:19:59 955793 [453B6940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:259 Producer:2 (Switch) from LID:10 TID:0x000000000000018f
Apr 13 12:19:59 955806 [453B6940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:19:59 955813 [453B6940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:20:00 491524 [45DB7940] 0x01 -> __osm_mcmr_rcv_join_mgrp:
ERR 1B11: method = SubnAdmSet, scope_state = 0x1, component mask =
0x0000000000010083, expected comp mask = 0x00000000000130c7, MGID:
0xff12401bffff0000 : 0x00000000ffffffff from port 0x0
24717124000029 (MT25408 IOSAN Fusion-IO)
Apr 13 12:20:00 953808 [42BB2940] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x02
num:128 Producer:2 (Switch) from LID:10 TID:0x0000000000000190
Apr 13 12:20:00 953822 [42BB2940] 0x01 -> osm_get_physp_by_mad_addr:
ERR 7503: Lid is out of range: 10
Apr 13 12:20:00 953830 [42BB2940] 0x01 ->
__osm_trap_rcv_process_request: ERR 3809: Failed to find source
physical port for trap
Apr 13 12:20:01 424318 [48FBC940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11 trans_id=0x5500001311)
-- dropping
Apr 13 12:20:01 424345 [48FBC940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Apr 13 12:20:01 424356 [48FBC940] 0x01 -> Received SMP on a 1 hop path:
Initial path = 0,0
Return path = 0,0
Apr 13 12:20:01 424366 [48FBC940] 0x01 ->
__osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error
(IB_TIMEOUT)
Apr 13 12:20:01 424410 [48FBC940] 0x01 -> SMP dump:
base_ver................0x1
mgmt_class..............0x81
class_ver...............0x1
method..................0x1 (SubnGet)
D bit...................0x0
status..................0x0
hop_ptr.................0x0
hop_count...............0x1
trans_id................0x1311
attr_id.................0x11 (NodeInfo)
resv....................0x0
attr_mod................0x0
m_key...................0x0000000000000000
dr_slid.................65535
dr_dlid.................65535
Initial path: 0,1
Return path: 0,0
Reserved: [0][0][0][0][0][0][0]
00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
> Any end
> node messages pertaining to IB ?
Nothing I can see.
>
>> (I restarted ib on the
>> clients), although ibstat shows the links up. What am I getting
>> wrong? The opensmd is running on the server.
>
> Which server ?
There's only one server... it has many ports for which I'm trying to
partition do different clients. So, in the above, when I say "Server
A", I mean server port "A".
>
> You still need the default partition with the SM node being full and
> the others being limited there (so it's also best to run SM on
> separate node if possible otherwise you have the potential of any
> client connecting to it on default partition).
Are you saying to change the partitions.conf file to:
part1=0x1, ipoib: 0x0024717124000029=full, 0x0002c90300028c01;
part2=0x2, ipoib: 0x002471712400002a=full, 0x0002c90300026047;
part3=0x3, ipoib: 0x0024717127000035=full, 0x0002c90300026053;
part4=0x4, ipoib: 0x0024717127000036=full, 0x0002c9030002603b;
... (which still doesn't work) in which case I set all the server's
ports to "full", or should just one be "full" (which didn't work
either)?
I did have a difficult time understanding the difference between
"full" and "limited" in the man page.
I've got a captive network, so I don't want any paths I've not
specified to be allowed. If that makes any sense. So, I didn't want
to put a statement in like:
Default=0x7fff,ipoib:ALL=full;
... that would let a rogue node slip through the cracks.
Thanks,
Chris
>
> -- Hal
>
>> Thanks,
>>
>> Chris
>>
>
More information about the general
mailing list