[ofa-general] Diagnostics output messages
Ramiro Alba Queipo
raq at cttc.upc.edu
Tue Sep 30 03:51:32 PDT 2008
Hello everybody:
We have just started to run a 22 nodes infiniband cluster (44 in a
couple
of months) under Ubuntu 8.04 and after carefully reading and testing
OFED 1.3.1 diagnogstics packages (ibutils and infiniband-diags), I have
got some messages I can not understand:
* ibdiagnet -o . -t file.topo -s jff -pm
-I---------------------------------------------------
-I- IPoIB Subnets Check
-I---------------------------------------------------
-I- Subnet: IPv4 PKey:0x7fff QKey:0x00000b1b MTU:2048Byte rate:10Gbps
SL:0x00
-W- Suboptimal rate for group. Lowest member rate:20Gbps >
group-rate:10Gbps
What does it mean?
* ibchecknet
#warn: counter RcvSwRelayErrors = 259 (threshold 100) lid 4 port 255
Error check on lid 4 (MT47396 Infiniscale-III Mellanox Technologies)
port all: FAILED
I could see that command 'perfquery -a 255' shows its counters, but:
- What is for?
- ibqueryerrors.pl -a says
RcvSwRelayErrors: This counter can increase due to a valid network
event
Should I worry by switch ports increasing little by little this
counter?
I am using IPoIB
* ibdiagpath -o . -t file.topo -s jff -n jff201
-I---------------------------------------------------
-I- QoS on Path Check
-I---------------------------------------------------
-W- VLArbTableLow Entries:6 7 VL > 5 at node:"jff/U1" lid=0x0001
guid=0x0002c90200279295 dev=25204 port:1
-W- VLArbTableHigh Entries:6 7 VL > 5 at node:"jff/U1" lid=0x0001
guid=0x0002c90200279295 dev=25204 port:1
-W- VLArbTableLow Entries:6 7 VL > 5 at node:"switch-1/U1" lid=0x0004
guid=0x000b8cffff0052cf dev=47396 port:1
-W- VLArbTableHigh Entries:6 7 VL > 5 at node:"switch-1/U1" lid=0x0004
guid=0x000b8cffff0052cf dev=47396 port:1
-W- SLs:6 7 14 15 mapped to VL > 5 at node:"switch-1/U1" lid=0x0004
guid=0x000b8cffff0052cf dev=47396 in-port:23 out-port:1
-I- The following SLs can be used:0 1 2 3 4 5 8 9 10 11 12 13
What is the meaning of this messages?
Finally, and not related to diagnostics messages, I have to change
permissions at
crw-rw---- 1 root rdma 231, 192 2008-09-30 09:19 /dev/infiniband/uverbs0
to be 'rw' to everybody.
Should I add users to 'rdma' group instead?
---
Thanks in advance
Regards
--
Aquest missatge ha estat analitzat per MailScanner
a la cerca de virus i d'altres continguts perillosos,
i es considera que està net.
For all your IT requirements visit: http://www.transtec.co.uk
More information about the general
mailing list