[ofa-general] Diagnostics output messages

Ramiro Alba Queipo raq at cttc.upc.edu
Tue Sep 30 07:11:32 PDT 2008


On Tue, 2008-09-30 at 08:35 -0400, Hal Rosenstock wrote:
> On Tue, Sep 30, 2008 at 6:51 AM, Ramiro Alba Queipo <raq at cttc.upc.edu>
wrote:
> > Hello everybody:
> >
> > We have just started to run a 22 nodes infiniband cluster (44 in a
> > couple
> > of months) under Ubuntu 8.04 and after carefully reading and testing
> > OFED 1.3.1 diagnogstics packages (ibutils and infiniband-diags), I
have
> > got some messages I can not understand:
> >
> > * ibdiagnet -o . -t file.topo -s jff -pm
> >
> >
> > -I---------------------------------------------------
> > -I- IPoIB Subnets Check
> > -I---------------------------------------------------
> > -I- Subnet: IPv4 PKey:0x7fff QKey:0x00000b1b MTU:2048Byte
rate:10Gbps
> > SL:0x00
> > -W- Suboptimal rate for group. Lowest member rate:20Gbps >
> > group-rate:10Gbps
> >
> >
> > What does it mean?
> 
> This means your subnet is pure DDR and the IPoIB broadcast group can
> run at a higher rate than the default. This is done via OpenSM
> configuration which is slightly different depending on which version
> you are using.
> 

OpenSM 3.1.11


> > * ibchecknet
> >
> > #warn: counter RcvSwRelayErrors = 259   (threshold 100) lid 4 port
255
> > Error check on lid 4 (MT47396 Infiniscale-III Mellanox Technologies)
> > port all:  FAILED
> >
> >
> > I could see that command 'perfquery -a 255' shows its counters, but:
> >
> >    - What is for?
> >    - ibqueryerrors.pl -a says
> >      RcvSwRelayErrors: This counter can increase due to a valid
network
> > event
> >      Should I worry by switch ports increasing little by little this
> > counter?
> >
> > I am using IPoIB
> 
> Unfortunately when running IPoIB, RcvSwRelayErrors needs to be ignored
> as multicasts are counted as looping.
> 
> > * ibdiagpath -o . -t file.topo -s jff -n jff201
> >
> > -I---------------------------------------------------
> > -I- QoS on Path Check
> > -I---------------------------------------------------
> > -W- VLArbTableLow Entries:6 7 VL > 5 at node:"jff/U1" lid=0x0001
> >    guid=0x0002c90200279295 dev=25204 port:1
> > -W- VLArbTableHigh Entries:6 7 VL > 5 at node:"jff/U1" lid=0x0001
> >    guid=0x0002c90200279295 dev=25204 port:1
> > -W- VLArbTableLow Entries:6 7 VL > 5 at node:"switch-1/U1"
lid=0x0004
> >    guid=0x000b8cffff0052cf dev=47396 port:1
> > -W- VLArbTableHigh Entries:6 7 VL > 5 at node:"switch-1/U1"
lid=0x0004
> >    guid=0x000b8cffff0052cf dev=47396 port:1
> > -W- SLs:6 7 14 15 mapped to VL > 5 at node:"switch-1/U1" lid=0x0004
> >    guid=0x000b8cffff0052cf dev=47396 in-port:23 out-port:1
> > -I- The following SLs can be used:0 1 2 3 4 5 8 9 10 11 12 13
> >
> > What is the meaning of this messages?
> 
> I'm not sure but it looks like it's complaining about an invalid VL.
> Can you run:
> smpquery portinfo <lid> 1
> smpquery sl2vl <lid> 1
> smpquery vlarb <lid> 1
> for both of these lids ?
> 

# Port info: Lid 1 port 1
Mkey:............................0x0000000000000000
GidPrefix:.......................0xfe80000000000000
Lid:.............................0x0001
SMLid:...........................0x0001
CapMask:.........................0x2510a6a
                                IsSM
                                IsTrapSupported
                                IsAutomaticMigrationSupported
                                IsSLMappingSupported
                                IsLedInfoSupported
                                IsSystemImageGUIDsupported
                                IsCommunicatonManagementSupported
                                IsVendorClassSupported
                                IsCapabilityMaskNoticeSupported
                                IsClientRegistrationSupported
DiagCode:........................0x0000
MkeyLeasePeriod:.................0
LocalPort:.......................1
LinkWidthEnabled:................1X or 4X
LinkWidthSupported:..............1X or 4X
LinkWidthActive:.................4X
LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps
LinkState:.......................Active
PhysLinkState:...................LinkUp
LinkDownDefState:................Polling
ProtectBits:.....................0
LMC:.............................0
LinkSpeedActive:.................5.0 Gbps
LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps
NeighborMTU:.....................2048
SMSL:............................0
VLCap:...........................VL0-3
InitType:........................0x00
VLHighLimit:.....................0
VLArbHighCap:....................8
VLArbLowCap:.....................8
InitReply:.......................0x00
MtuCap:..........................2048
VLStallCount:....................7
HoqLife:.........................31
OperVLs:.........................VL0-3
PartEnforceInb:..................0
PartEnforceOutb:.................0
FilterRawInb:....................0
FilterRawOutb:...................0
MkeyViolations:..................0
PkeyViolations:..................0
QkeyViolations:..................0
GuidCap:.........................32
ClientReregister:................0
SubnetTimeout:...................18
RespTimeVal:.....................16
LocalPhysErr:....................8
OverrunErr:......................8
MaxCreditHint:...................0
RoundTrip:.......................0

# SL2VL table: Lid 1
#                 SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|
ports: in  0, out  0: | 3| 2| 1| 0| 3| 2| 1| 0| 3| 2| 1| 0| 3| 2| 1| 0|

# VLArbitration tables: Lid 1 port 1 LowCap 8 HighCap 8
# Low priority VL Arbitration Table:
VL    : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 |
WEIGHT: |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 |
# High priority VL Arbitration Table:
VL    : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 |
WEIGHT: |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |


# Port info: Lid 4 port 1
Mkey:............................0x0000000000000000
GidPrefix:.......................0x0000000000000000
Lid:.............................0x0000
SMLid:...........................0x0000
CapMask:.........................0x0
DiagCode:........................0x0000
MkeyLeasePeriod:.................0
LocalPort:.......................23
LinkWidthEnabled:................1X or 4X
LinkWidthSupported:..............1X or 4X
LinkWidthActive:.................4X
LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps
LinkState:.......................Active
PhysLinkState:...................LinkUp
LinkDownDefState:................Polling
ProtectBits:.....................0
LMC:.............................0
LinkSpeedActive:.................5.0 Gbps
LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps
NeighborMTU:.....................2048
SMSL:............................0
VLCap:...........................VL0-7
InitType:........................0x00
VLHighLimit:.....................0
VLArbHighCap:....................8
VLArbLowCap:.....................8
InitReply:.......................0x00
MtuCap:..........................2048
VLStallCount:....................7
HoqLife:.........................16
OperVLs:.........................VL0-3
PartEnforceInb:..................1
PartEnforceOutb:.................1
FilterRawInb:....................0
FilterRawOutb:...................0
MkeyViolations:..................0
PkeyViolations:..................0
QkeyViolations:..................0
GuidCap:.........................0
ClientReregister:................0
SubnetTimeout:...................0
RespTimeVal:.....................0
LocalPhysErr:....................8
OverrunErr:......................8
MaxCreditHint:...................0
RoundTrip:.......................0

# SL2VL table: Lid 4
#                 SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|
ports: in  0, out  1: | 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0|
ports: in  1, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in  2, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in  3, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in  4, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in  5, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in  6, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in  7, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in  8, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in  9, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 10, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 11, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 12, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 13, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 14, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 15, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 16, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 17, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 18, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 19, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 20, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 21, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 22, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 23, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
ports: in 24, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|

# VLArbitration tables: Lid 4 port 1 LowCap 8 HighCap 8
# Low priority VL Arbitration Table:
VL    : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 |
WEIGHT: |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 |
# High priority VL Arbitration Table:
VL    : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 |
WEIGHT: |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |


> -- Hal
> 
> > Finally, and not related to diagnostics messages, I have to change
> > permissions at
> >
> > crw-rw---- 1 root rdma 231, 192 2008-09-30
09:19 /dev/infiniband/uverbs0
> >
> > to be 'rw' to everybody.
> >
> > Should I add users to 'rdma' group instead?
> >
> >
> > ---
> > Thanks in advance
> >
> > Regards



-- 
Aquest missatge ha estat analitzat per MailScanner
a la cerca de virus i d'altres continguts perillosos,
i es considera que està net.
For all your IT requirements visit: http://www.transtec.co.uk




More information about the general mailing list