***SPAM*** Re: [ofa-general] Any easy way to specify to the SM to route/zone?

Chris Worley worleys at gmail.com
Mon Apr 13 13:09:28 PDT 2009


On Mon, Apr 13, 2009 at 12:52 PM, Hal Rosenstock
<hal.rosenstock at gmail.com> wrote:
> On Mon, Apr 13, 2009 at 2:26 PM, Chris Worley <worleys at gmail.com> wrote:
>> On Mon, Apr 13, 2009 at 11:53 AM, Hal Rosenstock
>> <hal.rosenstock at gmail.com> wrote:
>>> On Mon, Apr 13, 2009 at 12:02 PM, Chris Worley <worleys at gmail.com> wrote:
>>>> On Mon, Apr 13, 2009 at 7:43 AM, Hal Rosenstock <hal.rosenstock at gmail.com> wrote:
>>>>> On Mon, Apr 13, 2009 at 9:37 AM, Chris Worley <worleys at gmail.com> wrote:
>>>>>> On Mon, Apr 13, 2009 at 5:39 AM, Hal Rosenstock
>>>>>> <hal.rosenstock at gmail.com> wrote:
>>>>>>> On Sun, Apr 12, 2009 at 11:01 PM, Chris Worley <worleys at gmail.com> wrote:
>>>>>>>>
>>>>>>>> So I need to tell the SM to route specific ports on the server/target
>>>>>>>> to specific clients/initiators.
>>>>>>>>
>>>>>>>> Is there any way to do this?
>>>>>>>
>>>>>>> Do you mean restrict access between certain clients/servers ?
>>>>>>
>>>>>> One server w/ 4QDR boards, 16 clients with one QDR board.  I want each
>>>>>> port on the server routed/zoned to two clients.
>>>>>>
>>>>>>> If so,
>>>>>>> you can do this with partitioning
>>>>>>
>>>>>> What is partitioning?
>>>>>
>>>>> A partition is a collection of ports which are allowed to communicate
>>>>> together. There are two forms of members: full members which can talk
>>>>> to any other member (useful for servers) and limited members which can
>>>>> only talk to full members (useful for clients). See the opensm man
>>>>> page or partition-config.txt on setting this up for OpenSM.
>>>>>
>>>>
>>>> Let me see if I understand this with a simple example... my port GUIDs
>>>> (as reported by ibstat) are for one server (4 QDR ports) and four
>>>> clients (one QDR port each):
>>>>
>>>>
>>>> Server A:           Port GUID: 0x0024717124000029
>>>> Server B:           Port GUID: 0x002471712400002a
>>>> Server C:           Port GUID: 0x0024717127000035
>>>> Server D:           Port GUID: 0x0024717127000036
>>>>
>>>> Client 1:                Port GUID: 0x0002c90300028c01
>>>> Client 2:                Port GUID: 0x0002c90300026047
>>>> Client 3:                Port GUID: 0x0002c90300026053
>>>> Client 4:                Port GUID: 0x0002c9030002603b
>>>>
>>>> Assuming I want a 1:1 (one server port to one client) partitioning, I
>>>> would put the following in /etc/ofed/partitions.conf:
>>>>
>>>> part1=0x1, ipoib, defmember=full : 0x0024717124000029, 0x0002c90300028c01;
>>>> part2=0x2, ipoib, defmember=full : 0x002471712400002a, 0x0002c90300026047;
>>>> part3=0x3, ipoib, defmember=full : 0x0024717127000035, 0x0002c90300026053;
>>>> part4=0x4, ipoib, defmember=full : 0x0024717127000036, 0x0002c9030002603b;
>>>
>>> So you want IPoIB.
>>
>> I'm doing SRP, so I need IPoIB working.
>
> SRP needs to query PathRecord with the correct PKey and use the
> correct Pkey index for that partition. I'm not sure how that is done
> in SRP but first IPoIB needs to be made to work (again).
>

Okay... I'll setup the IPoIB as the ipoib.txt suggests, i.e.:

echo 0x1 > /sys/class/net/ib0/create_child

... but for now, I'm still not seeing the state go to "up"... I think
that's the first problem.
>>>
>>>> ... and run w/:
>>>>
>>>> opensm -r -B -P/etc/ofed/partitions.conf
>
> Also, do you need to use -r ? It's better not to (reassign LIDs).

I'm using it to assure that it just doesn't hang on to the old state,
especially since I'm not getting the SM working... I don't want it to
assume anything is right about the previous state.

I have tried w/ and w/o and don't see a difference.

The plan is, once I get it working, to remove the "-r".  Or, are you
suggesting I not use it?

>
>>>> Does that sound correct?  It doesn't work
>>>
>>> What application(s) aren't working ?
>>
>> ping over IPoIB, for example.
>>
>> I am seeing the test node in an "initializing" state right now... I
>> thought it was "up" before.
>
> Yes, this has gone "backwards" (not as far along yet...)
>

I think getting to an "up" state is the first step.

>>> Any SM error messages ?
>>
>> The server has one klogd error coming out continuously:
>>
>> ib0: multicast join failed for
>> ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
>
> IPoIB broadcast group (on the default partition) can't be joined (I'm
> presuming due to the current partition setup (e.g. it worked prior to
> this, right ?)).
>
> You need to do some IPoIB configuration relative to partitions as well.
> See kernel Documentation/infiniband/ipoib.txt for help with this.
>

Will do.  As you say, the trick will be getting SRP to use the right
P_Key's... but I need to get the IB in an "up" state first.

<snip sm output>
>>> Which server ?
>>
>> There's only one server... it has many ports for which I'm trying to
>> partition do different clients.  So, in the above, when I say "Server
>> A", I mean server port "A".
>
> I meant which server port is running OpenSM (which GUID is being
> used). I see above it is 0x24717124000029

That was it.  I've switched to a client as the SM now, as you suggest
a stand-alone SM.

>
>>> You still need the default partition with the SM node being full and
>>> the others being limited there (so it's also best to run SM on
>>> separate node if possible otherwise you have the potential of any
>>> client connecting to it on default partition).
>>
>> Are you saying to change the partitions.conf file to:
>>
>> part1=0x1, ipoib: 0x0024717124000029=full, 0x0002c90300028c01;
>> part2=0x2, ipoib: 0x002471712400002a=full, 0x0002c90300026047;
>> part3=0x3, ipoib: 0x0024717127000035=full, 0x0002c90300026053;
>> part4=0x4, ipoib: 0x0024717127000036=full, 0x0002c9030002603b;
>
> That's part of it.
>
>> ... (which still doesn't work) in which case I set all the server's
>> ports to "full", or should just one be "full" (which didn't work
>> either)?
>
> You also need:
> Default=0x7fff: ALL, SELF=FULL;
> I would put that first.

So, now my /etc/ofed/partitions.conf file looks like:

Default=0x7fff: ALL, SELF=FULL;
part1=0x1, ipoib: 0x0002c903000292af=full, 0x0002c90300028c01;
part2=0x2, ipoib: 0x0002c903000292b0=full, 0x0002c90300026047;
part4=0x4, ipoib: 0x0024717124000029=full, 0x0002c9030002603b;

... I pulled out the node on partition 3 to use as an SM exclusive
node, I also changed the server ports to some of the other IB ports on
that machine (port GUIDs as shown by ibstat).  I set the server port
GUID's to "full", as I want the client GUIDs to talk to it, but not
necessarily each other (as there is only one client GUID on each
partition now, it's a moot point).

Note that I made-up the partition P_Key's of 1, 2, and 4.

Note that it still doesn't work.  On the stand-alone SM, ibstat looks like:

# ibstat
CA 'mlx4_0'
        CA type: MT26428
        Number of ports: 2
        Firmware version: 2.6.0
        Hardware version: a0
        Node GUID: 0x0002c90300026052
        System image GUID: 0x0002c90300026055
        Port 1:
                State: Armed
                Physical state: LinkUp
                Rate: 10
                Base lid: 1
                LMC: 0
                SM lid: 1
                Capability mask: 0x0251086a
                Port GUID: 0x0002c90300026053
        Port 2:
                State: Down
                Physical state: Polling
                Rate: 10
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x02510868
                Port GUID: 0x0002c90300026054

... On the server, the devices mentioned in the partitions file look like:

CA 'mlx4_0'
        CA type: MT25418
        Number of ports: 2
        Firmware version: 2.6.0
        Hardware version: a0
        Node GUID: 0x0024717124000028
        System image GUID: 0x002471712400002b
        Port 1:
                State: Initializing
                Physical state: LinkUp
                Rate: 10
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x02510868
                Port GUID: 0x0024717124000029
        Port 2:
                State: Initializing
                Physical state: LinkUp
                Rate: 10
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x02510868
                Port GUID: 0x002471712400002a
CA 'mlx4_1'
        CA type: MT26428
        Number of ports: 2
        Firmware version: 2.6.0
        Hardware version: a0
        Node GUID: 0x0002c903000292ae
        System image GUID: 0x0002c903000292b1
        Port 1:
                State: Initializing
                Physical state: LinkUp
                Rate: 10
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x02510868
                Port GUID: 0x0002c903000292af
        Port 2:
                State: Initializing
                Physical state: LinkUp
                Rate: 10
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x02510868
                Port GUID: 0x0002c903000292b0

On one of the clients:

# ibstat
CA 'mlx4_0'
        CA type: MT26428
        Number of ports: 2
        Firmware version: 2.6.0
        Hardware version: a0
        Node GUID: 0x0002c90300026046
        System image GUID: 0x0002c90300026049
        Port 1:
                State: Initializing
                Physical state: LinkUp
                Rate: 10
                Base lid: 7
                LMC: 0
                SM lid: 1
                Capability mask: 0x02510868
                Port GUID: 0x0002c90300026047
        Port 2:
                State: Down
                Physical state: Polling
                Rate: 10
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x02510868
                Port GUID: 0x0002c90300026048

Partition "part2" with P_Key=2 should connect this client's port 0 to
the sever on port 1 of mlx4_1

>
>> I did have a difficult time understanding the difference between
>> "full" and "limited" in the man page.
>
> On a given partition, full can talk with all other members whereas a
> limited member can only talk with full members (not other limited
> members).
>

I think I've got that correctly specified in the above partitions file.

>> I've got a captive network, so I don't want any paths I've not
>> specified to be allowed.  If that makes any sense.  So, I didn't want
>> to put a statement in like:
>>
>> Default=0x7fff,ipoib:ALL=full;
>>
>> ... that would let a rogue node slip through the cracks.
>
> The only one they can talk with is the SM (the way I'm proposing) so
> it's best if the SM node could be separate.

It's separate now.  The log looks like (in its entirety at statup):

Apr 13 13:41:56 182699 [1D71CA30] 0x03 -> OpenSM 3.2.5_20081207
Apr 13 13:41:56 182764 [1D71CA30] 0x80 -> OpenSM 3.2.5_20081207
Apr 13 13:41:56 183020 [1D71CA30] 0x02 -> osm_vendor_init: 1000
pending umads specified
Apr 13 13:41:56 183104 [1D71CA30] 0x80 -> Entering DISCOVERING state
Apr 13 13:41:56 193181 [1D71CA30] 0x02 -> osm_vendor_bind: Binding to
port 0x2c90300026053
Apr 13 13:41:56 217349 [1D71CA30] 0x02 -> osm_vendor_bind: Binding to
port 0x2c90300026053
Apr 13 13:41:57 018570 [47FCE940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11 trans_id=0x110000123b)
-- dropping
Apr 13 13:41:57 018586 [47FCE940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Apr 13 13:41:57 018603 [47FCE940] 0x01 -> Received SMP on a 1 hop path:
                                Initial path = 0,0
                                Return path  = 0,0
Apr 13 13:41:57 018608 [47FCE940] 0x01 ->
__osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error
(IB_TIMEOUT)
Apr 13 13:41:57 018626 [47FCE940] 0x01 -> SMP dump:
                                base_ver................0x1
                                mgmt_class..............0x81
                                class_ver...............0x1
                                method..................0x1 (SubnGet)
                                D bit...................0x0
                                status..................0x0
                                hop_ptr.................0x0
                                hop_count...............0x1
                                trans_id................0x123b
                                attr_id.................0x11 (NodeInfo)
                                resv....................0x0
                                attr_mod................0x0
                                m_key...................0x0000000000000000
                                dr_slid.................65535
                                dr_dlid.................65535

                                Initial path: 0,1
                                Return path:  0,0
                                Reserved:     [0][0][0][0][0][0][0]

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

Apr 13 13:41:57 018681 [475CD940] 0x80 -> Entering MASTER state
Apr 13 13:41:57 019791 [475CD940] 0x80 -> SUBNET UP
Apr 13 13:42:06 986336 [47FCE940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11 trans_id=0x1100001242)
-- dropping
Apr 13 13:42:06 986349 [47FCE940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Apr 13 13:42:06 986355 [47FCE940] 0x01 -> Received SMP on a 1 hop path:
                                Initial path = 0,0
                                Return path  = 0,0
Apr 13 13:42:06 986360 [47FCE940] 0x01 ->
__osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error
(IB_TIMEOUT)
Apr 13 13:42:06 986376 [47FCE940] 0x01 -> SMP dump:
                                base_ver................0x1
                                mgmt_class..............0x81
                                class_ver...............0x1
                                method..................0x1 (SubnGet)
                                D bit...................0x0
                                status..................0x0
                                hop_ptr.................0x0
                                hop_count...............0x1
                                trans_id................0x1242
                                attr_id.................0x11 (NodeInfo)
                                resv....................0x0
                                attr_mod................0x0
                                m_key...................0x0000000000000000
                                dr_slid.................65535
                                dr_dlid.................65535

                                Initial path: 0,1
                                Return path:  0,0
                                Reserved:     [0][0][0][0][0][0][0]

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

Apr 13 13:42:06 986708 [475CD940] 0x02 -> SUBNET UP
Apr 13 13:42:16 990103 [47FCE940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11 trans_id=0x1100001246)
-- dropping
Apr 13 13:42:16 990114 [47FCE940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Apr 13 13:42:16 990120 [47FCE940] 0x01 -> Received SMP on a 1 hop path:
                                Initial path = 0,0
                                Return path  = 0,0
Apr 13 13:42:16 990125 [47FCE940] 0x01 ->
__osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error
(IB_TIMEOUT)
Apr 13 13:42:16 990141 [47FCE940] 0x01 -> SMP dump:
                                base_ver................0x1
                                mgmt_class..............0x81
                                class_ver...............0x1
                                method..................0x1 (SubnGet)
                                D bit...................0x0
                                status..................0x0
                                hop_ptr.................0x0
                                hop_count...............0x1
                                trans_id................0x1246
                                attr_id.................0x11 (NodeInfo)
                                resv....................0x0
                                attr_mod................0x0
                                m_key...................0x0000000000000000
                                dr_slid.................65535
                                dr_dlid.................65535

                                Initial path: 0,1
                                Return path:  0,0
                                Reserved:     [0][0][0][0][0][0][0]

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

Apr 13 13:42:16 990475 [475CD940] 0x02 -> SUBNET UP
Apr 13 13:42:26 990871 [47FCE940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11 trans_id=0x110000124a)
-- dropping
Apr 13 13:42:26 990884 [47FCE940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Apr 13 13:42:26 990890 [47FCE940] 0x01 -> Received SMP on a 1 hop path:
                                Initial path = 0,0
                                Return path  = 0,0
Apr 13 13:42:26 990895 [47FCE940] 0x01 ->
__osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error
(IB_TIMEOUT)
Apr 13 13:42:26 990912 [47FCE940] 0x01 -> SMP dump:
                                base_ver................0x1
                                mgmt_class..............0x81
                                class_ver...............0x1
                                method..................0x1 (SubnGet)
                                D bit...................0x0
                                status..................0x0
                                hop_ptr.................0x0
                                hop_count...............0x1
                                trans_id................0x124a
                                attr_id.................0x11 (NodeInfo)
                                resv....................0x0
                                attr_mod................0x0
                                m_key...................0x0000000000000000
                                dr_slid.................65535
                                dr_dlid.................65535

                                Initial path: 0,1
                                Return path:  0,0
                                Reserved:     [0][0][0][0][0][0][0]

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

Apr 13 13:42:26 991227 [475CD940] 0x02 -> SUBNET UP
Apr 13 13:42:36 993638 [47FCE940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11 trans_id=0x110000124e)
-- dropping
Apr 13 13:42:36 993649 [47FCE940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Apr 13 13:42:36 993655 [47FCE940] 0x01 -> Received SMP on a 1 hop path:
                                Initial path = 0,0
                                Return path  = 0,0
Apr 13 13:42:36 993660 [47FCE940] 0x01 ->
__osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error
(IB_TIMEOUT)
Apr 13 13:42:36 993676 [47FCE940] 0x01 -> SMP dump:
                                base_ver................0x1
                                mgmt_class..............0x81
                                class_ver...............0x1
                                method..................0x1 (SubnGet)
                                D bit...................0x0
                                status..................0x0
                                hop_ptr.................0x0
                                hop_count...............0x1
                                trans_id................0x124e
                                attr_id.................0x11 (NodeInfo)
                                resv....................0x0
                                attr_mod................0x0
                                m_key...................0x0000000000000000
                                dr_slid.................65535
                                dr_dlid.................65535

                                Initial path: 0,1
                                Return path:  0,0
                                Reserved:     [0][0][0][0][0][0][0]

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

Apr 13 13:42:36 993996 [475CD940] 0x02 -> SUBNET UP
Apr 13 13:42:46 996409 [47FCE940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11 trans_id=0x1100001252)
-- dropping
Apr 13 13:42:46 996420 [47FCE940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Apr 13 13:42:46 996426 [47FCE940] 0x01 -> Received SMP on a 1 hop path:
                                Initial path = 0,0
                                Return path  = 0,0
Apr 13 13:42:46 996431 [47FCE940] 0x01 ->
__osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error
(IB_TIMEOUT)
Apr 13 13:42:46 996449 [47FCE940] 0x01 -> SMP dump:
                                base_ver................0x1
                                mgmt_class..............0x81
                                class_ver...............0x1
                                method..................0x1 (SubnGet)
                                D bit...................0x0
                                status..................0x0
                                hop_ptr.................0x0
                                hop_count...............0x1
                                trans_id................0x1252
                                attr_id.................0x11 (NodeInfo)
                                resv....................0x0
                                attr_mod................0x0
                                m_key...................0x0000000000000000
                                dr_slid.................65535
                                dr_dlid.................65535

                                Initial path: 0,1
                                Return path:  0,0
                                Reserved:     [0][0][0][0][0][0][0]

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

Apr 13 13:42:46 996800 [475CD940] 0x02 -> SUBNET UP
Apr 13 13:42:56 999180 [47FCE940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11 trans_id=0x1100001256)
-- dropping
Apr 13 13:42:56 999192 [47FCE940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Apr 13 13:42:56 999198 [47FCE940] 0x01 -> Received SMP on a 1 hop path:
                                Initial path = 0,0
                                Return path  = 0,0
Apr 13 13:42:56 999203 [47FCE940] 0x01 ->
__osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error
(IB_TIMEOUT)
Apr 13 13:42:56 999220 [47FCE940] 0x01 -> SMP dump:
                                base_ver................0x1
                                mgmt_class..............0x81
                                class_ver...............0x1
                                method..................0x1 (SubnGet)
                                D bit...................0x0
                                status..................0x0
                                hop_ptr.................0x0
                                hop_count...............0x1
                                trans_id................0x1256
                                attr_id.................0x11 (NodeInfo)
                                resv....................0x0
                                attr_mod................0x0
                                m_key...................0x0000000000000000
                                dr_slid.................65535
                                dr_dlid.................65535

                                Initial path: 0,1
                                Return path:  0,0
                                Reserved:     [0][0][0][0][0][0][0]

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

Apr 13 13:42:56 999553 [475CD940] 0x02 -> SUBNET UP
Apr 13 13:43:07 001949 [47FCE940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11 trans_id=0x110000125a)
-- dropping
Apr 13 13:43:07 001963 [47FCE940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Apr 13 13:43:07 001969 [47FCE940] 0x01 -> Received SMP on a 1 hop path:
                                Initial path = 0,0
                                Return path  = 0,0
Apr 13 13:43:07 001975 [47FCE940] 0x01 ->
__osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error
(IB_TIMEOUT)
Apr 13 13:43:07 001992 [47FCE940] 0x01 -> SMP dump:
                                base_ver................0x1
                                mgmt_class..............0x81
                                class_ver...............0x1
                                method..................0x1 (SubnGet)
                                D bit...................0x0
                                status..................0x0
                                hop_ptr.................0x0
                                hop_count...............0x1
                                trans_id................0x125a
                                attr_id.................0x11 (NodeInfo)
                                resv....................0x0
                                attr_mod................0x0
                                m_key...................0x0000000000000000
                                dr_slid.................65535
                                dr_dlid.................65535

                                Initial path: 0,1
                                Return path:  0,0
                                Reserved:     [0][0][0][0][0][0][0]

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

Apr 13 13:43:07 002384 [475CD940] 0x02 -> SUBNET UP
Apr 13 13:43:17 004713 [47FCE940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11 trans_id=0x110000125e)
-- dropping
Apr 13 13:43:17 004727 [47FCE940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Apr 13 13:43:17 004733 [47FCE940] 0x01 -> Received SMP on a 1 hop path:
                                Initial path = 0,0
                                Return path  = 0,0
Apr 13 13:43:17 004738 [47FCE940] 0x01 ->
__osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error
(IB_TIMEOUT)
Apr 13 13:43:17 004755 [47FCE940] 0x01 -> SMP dump:
                                base_ver................0x1
                                mgmt_class..............0x81
                                class_ver...............0x1
                                method..................0x1 (SubnGet)
                                D bit...................0x0
                                status..................0x0
                                hop_ptr.................0x0
                                hop_count...............0x1
                                trans_id................0x125e
                                attr_id.................0x11 (NodeInfo)
                                resv....................0x0
                                attr_mod................0x0
                                m_key...................0x0000000000000000
                                dr_slid.................65535
                                dr_dlid.................65535

                                Initial path: 0,1
                                Return path:  0,0
                                Reserved:     [0][0][0][0][0][0][0]

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

Apr 13 13:43:17 005140 [475CD940] 0x02 -> SUBNET UP
Apr 13 13:43:27 007482 [47FCE940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11 trans_id=0x1100001262)
-- dropping
Apr 13 13:43:27 007497 [47FCE940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Apr 13 13:43:27 007503 [47FCE940] 0x01 -> Received SMP on a 1 hop path:
                                Initial path = 0,0
                                Return path  = 0,0
Apr 13 13:43:27 007508 [47FCE940] 0x01 ->
__osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error
(IB_TIMEOUT)
Apr 13 13:43:27 007524 [47FCE940] 0x01 -> SMP dump:
                                base_ver................0x1
                                mgmt_class..............0x81
                                class_ver...............0x1
                                method..................0x1 (SubnGet)
                                D bit...................0x0
                                status..................0x0
                                hop_ptr.................0x0
                                hop_count...............0x1
                                trans_id................0x1262
                                attr_id.................0x11 (NodeInfo)
                                resv....................0x0
                                attr_mod................0x0
                                m_key...................0x0000000000000000
                                dr_slid.................65535
                                dr_dlid.................65535

                                Initial path: 0,1
                                Return path:  0,0
                                Reserved:     [0][0][0][0][0][0][0]

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

Apr 13 13:43:27 007958 [475CD940] 0x02 -> SUBNET UP
Apr 13 13:43:37 010250 [47FCE940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11 trans_id=0x1100001266)
-- dropping
Apr 13 13:43:37 010264 [47FCE940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Apr 13 13:43:37 010270 [47FCE940] 0x01 -> Received SMP on a 1 hop path:
                                Initial path = 0,0
                                Return path  = 0,0
Apr 13 13:43:37 010275 [47FCE940] 0x01 ->
__osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error
(IB_TIMEOUT)
Apr 13 13:43:37 010292 [47FCE940] 0x01 -> SMP dump:
                                base_ver................0x1
                                mgmt_class..............0x81
                                class_ver...............0x1
                                method..................0x1 (SubnGet)
                                D bit...................0x0
                                status..................0x0
                                hop_ptr.................0x0
                                hop_count...............0x1
                                trans_id................0x1266
                                attr_id.................0x11 (NodeInfo)
                                resv....................0x0
                                attr_mod................0x0
                                m_key...................0x0000000000000000
                                dr_slid.................65535
                                dr_dlid.................65535

                                Initial path: 0,1
                                Return path:  0,0
                                Reserved:     [0][0][0][0][0][0][0]

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

Apr 13 13:43:37 010716 [475CD940] 0x02 -> SUBNET UP
Apr 13 13:43:47 013017 [47FCE940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11 trans_id=0x110000126a)
-- dropping
Apr 13 13:43:47 013029 [47FCE940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Apr 13 13:43:47 013035 [47FCE940] 0x01 -> Received SMP on a 1 hop path:
                                Initial path = 0,0
                                Return path  = 0,0
Apr 13 13:43:47 013059 [47FCE940] 0x01 ->
__osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error
(IB_TIMEOUT)
Apr 13 13:43:47 013077 [47FCE940] 0x01 -> SMP dump:
                                base_ver................0x1
                                mgmt_class..............0x81
                                class_ver...............0x1
                                method..................0x1 (SubnGet)
                                D bit...................0x0
                                status..................0x0
                                hop_ptr.................0x0
                                hop_count...............0x1
                                trans_id................0x126a
                                attr_id.................0x11 (NodeInfo)
                                resv....................0x0
                                attr_mod................0x0
                                m_key...................0x0000000000000000
                                dr_slid.................65535
                                dr_dlid.................65535

                                Initial path: 0,1
                                Return path:  0,0
                                Reserved:     [0][0][0][0][0][0][0]

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00
00 00 00 00

>
> In order for SA portion of SM to work, SM node must be a full member
> of the default partition and other nodes must be at least limited
> members (so their queries will be responded to). IPoIB is not needed
> on that partition.

I think I've got the partition file specified correctly... but then
again obviously not, as it doesn't work.

Thanks,

Chris



More information about the general mailing list