[ofa-general] Diagnostics output messages
Ramiro Alba Queipo
raq at cttc.upc.edu
Tue Sep 30 07:11:32 PDT 2008
On Tue, 2008-09-30 at 08:35 -0400, Hal Rosenstock wrote:
> On Tue, Sep 30, 2008 at 6:51 AM, Ramiro Alba Queipo <raq at cttc.upc.edu>
wrote:
> > Hello everybody:
> >
> > We have just started to run a 22 nodes infiniband cluster (44 in a
> > couple
> > of months) under Ubuntu 8.04 and after carefully reading and testing
> > OFED 1.3.1 diagnogstics packages (ibutils and infiniband-diags), I
have
> > got some messages I can not understand:
> >
> > * ibdiagnet -o . -t file.topo -s jff -pm
> >
> >
> > -I---------------------------------------------------
> > -I- IPoIB Subnets Check
> > -I---------------------------------------------------
> > -I- Subnet: IPv4 PKey:0x7fff QKey:0x00000b1b MTU:2048Byte
rate:10Gbps
> > SL:0x00
> > -W- Suboptimal rate for group. Lowest member rate:20Gbps >
> > group-rate:10Gbps
> >
> >
> > What does it mean?
>
> This means your subnet is pure DDR and the IPoIB broadcast group can
> run at a higher rate than the default. This is done via OpenSM
> configuration which is slightly different depending on which version
> you are using.
>
OpenSM 3.1.11
> > * ibchecknet
> >
> > #warn: counter RcvSwRelayErrors = 259 (threshold 100) lid 4 port
255
> > Error check on lid 4 (MT47396 Infiniscale-III Mellanox Technologies)
> > port all: FAILED
> >
> >
> > I could see that command 'perfquery -a 255' shows its counters, but:
> >
> > - What is for?
> > - ibqueryerrors.pl -a says
> > RcvSwRelayErrors: This counter can increase due to a valid
network
> > event
> > Should I worry by switch ports increasing little by little this
> > counter?
> >
> > I am using IPoIB
>
> Unfortunately when running IPoIB, RcvSwRelayErrors needs to be ignored
> as multicasts are counted as looping.
>
> > * ibdiagpath -o . -t file.topo -s jff -n jff201
> >
> > -I---------------------------------------------------
> > -I- QoS on Path Check
> > -I---------------------------------------------------
> > -W- VLArbTableLow Entries:6 7 VL > 5 at node:"jff/U1" lid=0x0001
> > guid=0x0002c90200279295 dev=25204 port:1
> > -W- VLArbTableHigh Entries:6 7 VL > 5 at node:"jff/U1" lid=0x0001
> > guid=0x0002c90200279295 dev=25204 port:1
> > -W- VLArbTableLow Entries:6 7 VL > 5 at node:"switch-1/U1"
lid=0x0004
> > guid=0x000b8cffff0052cf dev=47396 port:1
> > -W- VLArbTableHigh Entries:6 7 VL > 5 at node:"switch-1/U1"
lid=0x0004
> > guid=0x000b8cffff0052cf dev=47396 port:1
> > -W- SLs:6 7 14 15 mapped to VL > 5 at node:"switch-1/U1" lid=0x0004
> > guid=0x000b8cffff0052cf dev=47396 in-port:23 out-port:1
> > -I- The following SLs can be used:0 1 2 3 4 5 8 9 10 11 12 13
> >
> > What is the meaning of this messages?
>
> I'm not sure but it looks like it's complaining about an invalid VL.
> Can you run:
> smpquery portinfo <lid> 1
> smpquery sl2vl <lid> 1
> smpquery vlarb <lid> 1
> for both of these lids ?
>
# Port info: Lid 1 port 1
Mkey:............................0x0000000000000000
GidPrefix:.......................0xfe80000000000000
Lid:.............................0x0001
SMLid:...........................0x0001
CapMask:.........................0x2510a6a
IsSM
IsTrapSupported
IsAutomaticMigrationSupported
IsSLMappingSupported
IsLedInfoSupported
IsSystemImageGUIDsupported
IsCommunicatonManagementSupported
IsVendorClassSupported
IsCapabilityMaskNoticeSupported
IsClientRegistrationSupported
DiagCode:........................0x0000
MkeyLeasePeriod:.................0
LocalPort:.......................1
LinkWidthEnabled:................1X or 4X
LinkWidthSupported:..............1X or 4X
LinkWidthActive:.................4X
LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps
LinkState:.......................Active
PhysLinkState:...................LinkUp
LinkDownDefState:................Polling
ProtectBits:.....................0
LMC:.............................0
LinkSpeedActive:.................5.0 Gbps
LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps
NeighborMTU:.....................2048
SMSL:............................0
VLCap:...........................VL0-3
InitType:........................0x00
VLHighLimit:.....................0
VLArbHighCap:....................8
VLArbLowCap:.....................8
InitReply:.......................0x00
MtuCap:..........................2048
VLStallCount:....................7
HoqLife:.........................31
OperVLs:.........................VL0-3
PartEnforceInb:..................0
PartEnforceOutb:.................0
FilterRawInb:....................0
FilterRawOutb:...................0
MkeyViolations:..................0
PkeyViolations:..................0
QkeyViolations:..................0
GuidCap:.........................32
ClientReregister:................0
SubnetTimeout:...................18
RespTimeVal:.....................16
LocalPhysErr:....................8
OverrunErr:......................8
MaxCreditHint:...................0
RoundTrip:.......................0
# SL2VL table: Lid 1
# SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|
ports: in 0, out 0: | 3| 2| 1| 0| 3| 2| 1| 0| 3| 2| 1| 0| 3| 2| 1| 0|
# VLArbitration tables: Lid 1 port 1 LowCap 8 HighCap 8
# Low priority VL Arbitration Table:
VL : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 |
WEIGHT: |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 |
# High priority VL Arbitration Table:
VL : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 |
WEIGHT: |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |
# Port info: Lid 4 port 1
Mkey:............................0x0000000000000000
GidPrefix:.......................0x0000000000000000
Lid:.............................0x0000
SMLid:...........................0x0000
CapMask:.........................0x0
DiagCode:........................0x0000
MkeyLeasePeriod:.................0
LocalPort:.......................23
LinkWidthEnabled:................1X or 4X
LinkWidthSupported:..............1X or 4X
LinkWidthActive:.................4X
LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps
LinkState:.......................Active
PhysLinkState:...................LinkUp
LinkDownDefState:................Polling
ProtectBits:.....................0
LMC:.............................0
LinkSpeedActive:.................5.0 Gbps
LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps
NeighborMTU:.....................2048
SMSL:............................0
VLCap:...........................VL0-7
InitType:........................0x00
VLHighLimit:.....................0
VLArbHighCap:....................8
VLArbLowCap:.....................8
InitReply:.......................0x00
MtuCap:..........................2048
VLStallCount:....................7
HoqLife:.........................16
OperVLs:.........................VL0-3
PartEnforceInb:..................1
PartEnforceOutb:.................1
FilterRawInb:....................0
FilterRawOutb:...................0
MkeyViolations:..................0
PkeyViolations:..................0
QkeyViolations:..................0
GuidCap:.........................0
ClientReregister:................0
SubnetTimeout:...................0
RespTimeVal:.....................0
LocalPhysErr:....................8
OverrunErr:......................8
MaxCreditHint:...................0
RoundTrip:.......................0
# SL2VL table: Lid 4
# SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|
ports: in 0, out 1: | 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0|
ports: in 1, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 2, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 3, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 4, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 5, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 6, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 7, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 8, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 9, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 10, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 11, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 12, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 13, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 14, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 15, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 16, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 17, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 18, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 19, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 20, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 21, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 22, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 23, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 24, out 1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
# VLArbitration tables: Lid 4 port 1 LowCap 8 HighCap 8
# Low priority VL Arbitration Table:
VL : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 |
WEIGHT: |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 |
# High priority VL Arbitration Table:
VL : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 |
WEIGHT: |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |
> -- Hal
>
> > Finally, and not related to diagnostics messages, I have to change
> > permissions at
> >
> > crw-rw---- 1 root rdma 231, 192 2008-09-30
09:19 /dev/infiniband/uverbs0
> >
> > to be 'rw' to everybody.
> >
> > Should I add users to 'rdma' group instead?
> >
> >
> > ---
> > Thanks in advance
> >
> > Regards
--
Aquest missatge ha estat analitzat per MailScanner
a la cerca de virus i d'altres continguts perillosos,
i es considera que està net.
For all your IT requirements visit: http://www.transtec.co.uk
More information about the general
mailing list