[ewg] interoperability between 1.5.1 2/19 and earlier OFED on IBM bladecenter.

Robert Pearson rpearson at systemfabricworks.com
Sat Feb 20 13:39:29 PST 2010


I saw the notes about opensm not working on the 2/19 build which is what I
was using. Currently see the following behavior:

 

Have 3 blades with QDR HCAs and 2 QDR switches. The 2nd switch only connects
to the blades when nodes reboot they come up to INIT so switch is not
managed as I thought.

 

Two of the blades are installed with RHEL 5.3+OFED 1.4.1. When the third
blade is installed with OFED 1.5 the three systems all see each other and
work correctly. When the 3rd blade is installed with 1.5.1 then it does not
come ACTIVE. When you run ibnetdiscover from blade 3 you get:

 

[root at blade3 ~]# ibnetdiscover -P 2

ibwarn: [4064] mad_rpc: _do_madrpc failed; dport (DR path slid 0; dlid 0;
0,2)

src/ibnetdisc.c:457; Query remote node (DR path slid 0; dlid 0; 0,2) failed,
skipping port

#

# Topology file: generated on Sat Feb 20 15:27:20 2010

#

# Initiated from node 0002c9030004de20 port 0002c9030004de22

 

vendid=0x2c9

devid=0x673c

sysimgguid=0x2c9030004de23

caguid=0x2c9030004de20

Ca      2 "H-0002c9030004de20"          # "blade3 HCA-1"

 

When you run ibnetdiscover on another blade you see:

 

[root at blade1 xrdma]# ibnetdiscover -P 2

ibwarn: [8307] mad_rpc: _do_madrpc failed; dport (DR path slid 0; dlid 0;
0,2,19)

ibwarn: [8307] handle_port: NodeInfo on DR path slid 0; dlid 0; 0,2,19
failed, skipping port

#

# Topology file: generated on Sat Feb 20 15:28:19 2010

#

# Max of 2 hops discovered

# Initiated from node 0002c9030004dc58 port 0002c9030004dc5a

 

vendid=0x8f1

devid=0x5a5e

sysimgguid=0x8f1050038014d

switchguid=0x8f1050038014c(8f1050038014c)

Switch  36 "S-0008f1050038014c"         # "IBM HSSM" enhanced port 0 lid 8
lmc 0

[18]    "H-0002c9030004db1c"[2](2c9030004db1e)          # "blade2 HCA-1" lid
6 4xQDR

[17]    "H-0002c9030004dc58"[2](2c9030004dc5a)          # "blade1" lid 5
4xQDR

 

vendid=0x2c9

devid=0x673c

sysimgguid=0x2c9030004db1f

caguid=0x2c9030004db1c

Ca      2 "H-0002c9030004db1c"          # "blade2 HCA-1"

[2](2c9030004db1e)      "S-0008f1050038014c"[18]                # lid 6 lmc
0 "IBM HSSM" lid 8 4xQDR

 

vendid=0x2c9

devid=0x673c

sysimgguid=0x2c9030004dc5b

caguid=0x2c9030004dc58

Ca      2 "H-0002c9030004dc58"          # "blade1"

[2](2c9030004dc5a)      "S-0008f1050038014c"[17]                # lid 5 lmc
0 "IBM HSSM" lid 8 4xQDR

 

Apparently the blades do NOT work together when one is at 1.5.1 2/19 and one
is earlier.

 

All of the nodes are at 2.6.818 firmware which seems to be the most recent
for IBM.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ewg/attachments/20100220/7e6ab811/attachment.html>


More information about the ewg mailing list