[openib-general] question on opensm error

Ronald G. Minnich rminnich at lanl.gov
Tue Feb 15 05:53:12 PST 2005



On Tue, 15 Feb 2005, Hal Rosenstock wrote:

> ibstatus/ibstat can show the local port logical and physical port state.

bluesteel:~ # ibstat
CA 'mthca0':
        CA type: MT23108
        Number of ports: 2
        Firmware version: 3.3.2
        Hardware version: a1
        Node GUID: 0x0002c90108a03e60
        System image GUID: 0x0002c9000100d050
        Port 1:
                State: Initializing
                Rate: 10
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x00500a68
                Port GUID: 0x0002c90108a03e61
        Port 2:
                State: Down
                Rate: 2
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x00500a68
                Port GUID: 0x0002c90108a03e62


> It might be helpful to try running ibnetdiscover -e (to show the
> errors). smpquery can also be used to query the bad link/host.

no -e switch on my copy. svn update time? 

This was kind of interesting, it did find a lot of switches ...
[0][1][3][8][7][3][3][2][8][5][8] -> known remote switch 
{0002c90108d19748} portnum 0 lid 0xe4-0xe4 "MT43132 Mellanox Technologies"
[0][1][3][8][7][3][3][2][8][2] -> processing switch {0002c90108d19200} 
portnum 0 lid 0x0-0x0 "MT43132 Mellanox Technologies"

(more like this -- much more)

and some hcas
[0][1][3][8][7][3][3][2][8][2][2] -> new remote hca {0002c901081e6700} 
portnum 1 lid 0x0-0x0 "MT23108 InfiniHost Mellanox Technologies"
        [1] {0002c901081e6700}

but osm.log is about 59MB of these:
[1108475425:000915547][411FF970] -> umad_receiver: send completed with 
error(method=1 attr=11) -- dropping.

smpquery? Have not seen that. Remember I'm trying to get this done with 
openib ONLY. Probably a bad idea :-)



here's plain ibnetdiscover

bluesteel:~ # ibnetdiscover 
warn: [4710] _do_madrpc: retry 2 (timeout 2000 ms)
warn: [4710] _do_madrpc: send failed; Invalid argument
warn: [4710] handle_port: Nodeinfo on [0][1][3][8][7][5][3][2][8][2][4] 
port 4 failed, skipping port
warn: [4710] _do_madrpc: retry 2 (timeout 2000 ms)
warn: [4710] _do_madrpc: send failed; Invalid argument
warn: [4710] handle_port: Nodeinfo on [0][1][3][8][7][2][3][4][1][1][2] 
port 2 failed, skipping port
warn: [4710] _do_madrpc: retry 2 (timeout 2000 ms)
warn: [4710] _do_madrpc: send failed; Invalid argument
warn: [4710] handle_port: Nodeinfo on [0][1][3][8][7][2][3][1][8][4][2] 
port 2 failed, skipping port

ron



More information about the general mailing list