[Users] Trouble with subnet_prefix

Orion Poplawski orion at cora.nwra.com
Tue Apr 30 08:23:57 PDT 2013


I'm going to have some overlapping IB networks, and to shut up openmpi's 
warning about multiple ports with the default subnet, I'm trying to change the 
subnet_prefix to 0xfe80000000000001 (in /etc/rdma/opensm.conf).  However, now 
things are not happy and I'm seeing the following in opensm.log:

Apr 30 09:08:58 739460 [DA401700] 0x01 -> mcmr_rcv_join_mgrp: ERR 1B11: method 
= SubnAdmSet, scope_state = 0x1, component mask = 0x0000000000010083, expected 
comp mask = 0x00000000000130c7, MGID: ff12:401b:ffff::ffff:ffff from port 
0x0019bbffff005851 (saga mthca0)
Apr 30 09:09:03 372476 [D17F3700] 0x01 -> mcmr_rcv_join_mgrp: ERR 1B11: method 
= SubnAdmSet, scope_state = 0x1, component mask = 0x0000000000010083, expected 
comp mask = 0x00000000000130c7, MGID: ff12:401b:ffff::ffff:ffff from port 
0x001708ffffd09df9 (alexandria2 HCA-1)

and I cannot ping remote IB IPs.

[root at saga ~]# ibstat
CA 'mthca0'
         CA type: MT25208 (MT23108 compat mode)
         Number of ports: 2
         Firmware version: 4.7.400
         Hardware version: a0
         Node GUID: 0x0019bbffff005850
         System image GUID: 0x0019bbffff005853
         Port 1:
                 State: Active
                 Physical state: LinkUp
                 Rate: 8
                 Base lid: 1
                 LMC: 0
                 SM lid: 1
                 Capability mask: 0x02510a6a
                 Port GUID: 0x0019bbffff005851
                 Link layer: InfiniBand
         Port 2:
                 State: Active
                 Physical state: LinkUp
                 Rate: 8
                 Base lid: 4
                 LMC: 0
                 SM lid: 1
                 Capability mask: 0x02510a68
                 Port GUID: 0x0019bbffff005852
                 Link layer: InfiniBand
[root at saga ~]# ip addr show dev ib0
4: ib0: <BROADCAST,MULTICAST,UP> mtu 2044 qdisc pfifo_fast state UNKNOWN qlen 256
     link/infiniband 
80:00:04:04:fe:80:00:00:00:00:00:01:00:19:bb:ff:ff:00:58:51 brd 
00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
     inet 192.168.2.12/24 brd 192.168.2.255 scope global ib0

[root at alexandria2 ~]# ibstat
CA 'mthca0'
         CA type: MT25208 (MT23108 compat mode)
         Number of ports: 2
         Firmware version: 4.7.400
         Hardware version: a0
         Node GUID: 0x001708ffffd09df8
         System image GUID: 0x001708ffffd09dfb
         Port 1:
                 State: Active
                 Physical state: LinkUp
                 Rate: 10
                 Base lid: 9
                 LMC: 0
                 SM lid: 1
                 Capability mask: 0x02510a68
                 Port GUID: 0x001708ffffd09df9
                 Link layer: InfiniBand
         Port 2:
                 State: Active
                 Physical state: LinkUp
                 Rate: 10
                 Base lid: 8
                 LMC: 0
                 SM lid: 1
                 Capability mask: 0x02510a68
                 Port GUID: 0x001708ffffd09dfa
                 Link layer: InfiniBand
[root at alexandria2 ~]# ip addr show dev ib0
6: ib0: <BROADCAST,MULTICAST,UP> mtu 2044 qdisc pfifo_fast state UNKNOWN qlen 256
     link/infiniband 
80:00:04:04:fe:80:00:00:00:00:00:01:00:17:08:ff:ff:d0:9d:f9 brd 
00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
     inet 192.168.2.16/24 brd 192.168.2.255 scope global ib0

[root at alexandria2 ~]# ibping -G 0x0019bbffff005851
Pong from saga.cora.nwra.com.(none) (Lid 1): time 0.133 ms
Pong from saga.cora.nwra.com.(none) (Lid 1): time 0.103 ms
^C
--- saga.cora.nwra.com.(none) (Lid 1) ibping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1908 ms
rtt min/avg/max = 0.103/0.118/0.133 ms
[root at alexandria2 ~]# ibping -G 0x0019bbffff005852
ibwarn: [3274] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 4)
ibwarn: [3274] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 4)
^C
---  (Lid 4) ibping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 7636 ms
rtt min/avg/max = 0.000/0.000/0.000 ms


I'm at a loss.  Any ideas?  Thanks!

-- 
Orion Poplawski
Technical Manager                     303-415-9701 x222
NWRA, Boulder/CoRA Office             FAX: 303-415-9702
3380 Mitchell Lane                       orion at nwra.com
Boulder, CO 80301                   http://www.nwra.com



More information about the Users mailing list