[openib-general] question on opensm error
Ronald G. Minnich
rminnich at lanl.gov
Tue Feb 15 05:53:12 PST 2005
On Tue, 15 Feb 2005, Hal Rosenstock wrote:
> ibstatus/ibstat can show the local port logical and physical port state.
bluesteel:~ # ibstat
CA 'mthca0':
CA type: MT23108
Number of ports: 2
Firmware version: 3.3.2
Hardware version: a1
Node GUID: 0x0002c90108a03e60
System image GUID: 0x0002c9000100d050
Port 1:
State: Initializing
Rate: 10
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x00500a68
Port GUID: 0x0002c90108a03e61
Port 2:
State: Down
Rate: 2
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x00500a68
Port GUID: 0x0002c90108a03e62
> It might be helpful to try running ibnetdiscover -e (to show the
> errors). smpquery can also be used to query the bad link/host.
no -e switch on my copy. svn update time?
This was kind of interesting, it did find a lot of switches ...
[0][1][3][8][7][3][3][2][8][5][8] -> known remote switch
{0002c90108d19748} portnum 0 lid 0xe4-0xe4 "MT43132 Mellanox Technologies"
[0][1][3][8][7][3][3][2][8][2] -> processing switch {0002c90108d19200}
portnum 0 lid 0x0-0x0 "MT43132 Mellanox Technologies"
(more like this -- much more)
and some hcas
[0][1][3][8][7][3][3][2][8][2][2] -> new remote hca {0002c901081e6700}
portnum 1 lid 0x0-0x0 "MT23108 InfiniHost Mellanox Technologies"
[1] {0002c901081e6700}
but osm.log is about 59MB of these:
[1108475425:000915547][411FF970] -> umad_receiver: send completed with
error(method=1 attr=11) -- dropping.
smpquery? Have not seen that. Remember I'm trying to get this done with
openib ONLY. Probably a bad idea :-)
here's plain ibnetdiscover
bluesteel:~ # ibnetdiscover
warn: [4710] _do_madrpc: retry 2 (timeout 2000 ms)
warn: [4710] _do_madrpc: send failed; Invalid argument
warn: [4710] handle_port: Nodeinfo on [0][1][3][8][7][5][3][2][8][2][4]
port 4 failed, skipping port
warn: [4710] _do_madrpc: retry 2 (timeout 2000 ms)
warn: [4710] _do_madrpc: send failed; Invalid argument
warn: [4710] handle_port: Nodeinfo on [0][1][3][8][7][2][3][4][1][1][2]
port 2 failed, skipping port
warn: [4710] _do_madrpc: retry 2 (timeout 2000 ms)
warn: [4710] _do_madrpc: send failed; Invalid argument
warn: [4710] handle_port: Nodeinfo on [0][1][3][8][7][2][3][1][8][4][2]
port 2 failed, skipping port
ron
More information about the general
mailing list