[openib-general] Anounce: Advanced Diagnostic Tools

Hal Rosenstock halr at voltaire.com
Thu Jan 12 07:01:10 PST 2006


On Wed, 2006-01-11 at 17:20, Eitan Zahavi wrote:
> Hi,
> 
> With the great help from Danny Zarko and Ariel Libman I was able to upload into
> https://openib.org/svn/gen2/utils/src/linux-user the first of several integrated IB
> diagnostic tools: ibdiagnet (diagnose network).
> The tool depends on ibis, ibdm (available under in the same directory).
> It's main differences from the diag tools available under the trunk are:

Just to set the record straight, some but not all of the below are
supported in the current OpenIB diag tools.

-- Hal

> 1. Performs a complete diagnostic procedure, including:
>     * discovery,
>     * PM counters check,
>     * duplicate LID/GUID
>     * ALL to ALL connectivity check (based on LFT data extracted from the fabric)
>     * Multicast connectivity and report
>     * Credit loop analysis
>     * and various other fabric statistics
> 2. If a topology file is provided - all reports are given using system names (rather then
>     LID, GUID or directed paths.
> 
> ###############################################################################################
> Here are some stdout examples
> ------------------------------
> 1. BAD LIDS
> -E- Device(s) with LID = 0x0000 found in the fabric:
>      path="1 1 3 5" H-12/U1 PN=2
>      path="1 1 3 4" H-11/U1 PN=1
>      path="1 4" H-3/U1 PN=1
> 
> 2. DUPLICATED PORT GUIDS
> -E- Devices with identical PortGUID = 0x0002c90000000006 found in the fabric:
>      path="1 1" GNU1/main/U2
>      path="1 1 5 6" H-9/U1 PN=1
>      path="1 1 5 5" H-10/U1 PN=2
> 
> 3. BAD LINKS
> -I- Errors have occurred on the following links (for errors details, look in log
>      file /tmp/ibmgtsim.31602/ibdiagnet.log):
>      Cable:         GNU1/M/P7(GNU1/main/U4/P4) =---= H-7/P2(H-7/U1/P2)
>      Cable:         GNU1/M/P5(GNU1/main/U4/P6) =---= H-5/P2(H-5/U1/P2)
> 
> 4. TOPOLOGY MATCH
> -I- Note that "bad" links and the part of the fabric to which they led (in the
>      BFS discovery of the fabric, starting at the local node) are not discovered
>      and therefore will be reported as "missing".
> 
>    Missing System:H-7(Cougar)
>       Should be connected by cable from port: P2(H-7/U1/P2)
>       to:GNU1/M/P7(GNU1/main/U4/P4)
> 
>    Missing System:H-5(Cougar)
>       Should be connected by cable from port: P2(H-5/U1/P2)
>       to:GNU1/M/P5(GNU1/main/U4/P6)
> 
> 5. MULTICAST ROUTING
> -I- Scanning all multicast groups for loops and connectivity...
> -I- Multicast Group:0xC000 has:2 switches and:2 HCAs
> -E- Extra switch:GNU1/leaf1/U1 in group:0xC000
> -E- Extra switch:GNU1/main/U4 in group:0xC000
> -I- Multicast Group:0xC001 has:4 switches and:4 HCAs
> -E- Extra switch:GNU1/leaf1/U1 in group:0xC001
> -I- Multicast Group:0xC002 has:5 switches and:5 HCAs
> -E- 3 multicast group checks failed
> 
> -I---------------------------------------------------
> -I- mgid-mlid-HCAs matching table
> -I---------------------------------------------------
> mgid                                  | mlid   | HCAs
> --------------------------------------------------------------------------------
> 0xff12401bffff0000:0x00000000ffffffff | 0xc000 | H-11/U1,H-12/U1
> 0xff12401bffff0000:0x0000000000000001 | 0xc001 | H-15/U1,H-3/U1,H-2/U1,H-7/U1
> 0xff12401bffff0000:0x0000000000000002 | 0xc002 | H-10/U1,H-16/U1,H-4/U1,H-6/U1
> 
> 6. UNICAST ROUTING:
> -I- Verifying all CA to CA paths ...
> -E- Unassigned LFT for lid:10 Dead end at:GNU1/main/U1
> -E- Fail to find a path from:H-1/U1/1 to:H-12/U1/2
> -E- Unassigned LFT for lid:18 Dead end at:GNU1/main/U3
> -E- Fail to find a path from:H-1/U1/1 to:H-5/U1/2
> [snip]
> -E- Found 19 missing paths out of:240 paths
> 
> 7. CREDIT LOOPS
> -I- Tracing all CA to CA paths for Credit Loops potential ...
> -E- Potential Credit Loop on Path from:H-1/U1/1 to:H-13/U1/1
>    Going:Down from:GNU1/main/U1 to:GNU1/main/U3
>    Going:Up from:GNU1/main/U3 to:GNU1/main/U1
>    Going:Down from:GNU1/main/U1 to:GNU1/leaf1/U1
> 
> 
> NOTE: All the above cases simulated on top of ibmgtsim.
>        Errors injected by simulation flows.
> ######################################################################################
> A full man page:
> ====================
> NAME
>    ibdiagnet
> 
> SYNOPSYS
>    ibdiagnet [-c <count>] [-v] [-r] [-t <topo-file>] [-s <sys-name>]
>       [-i <dev-index>] [-p <port-num>] [-o <out-dir>]
> 
> DESCRIPTION
>    ibdiagnet scans the fabric using directed route packets and extracts all the
>    available information regarding its connectivity and devices.
>    It then produces the following files in the output directory defined by the
>    -o option (see below):
>      ibdiagnet.lst    - List of all the nodes, ports and links in the fabric
>      ibdiagnet.fdbs   - A dump of the unicast forwarding tables of the fabric
>                          switches
>      ibdiagnet.mcfdbs - A dump of the multicast forwarding tables of the fabric
>                          switches
>    In addition to generating the files above, the discovery phase also checks for
>    duplicate node GUIDs in the IB fabric. If such an error is detected, it is
>    displayed on the standard output.
>    After the discovery phase is completed, directed route packets are sent
>    multiple times (according to the -c option) to detect possible problematic
>    paths on which packets may be lost. Such paths are explored, and a report of
>    the suspected bad links is displayed on the standard output.
>    After scanning the fabric, if the -r option is provided, a full report of the
>    fabric qualities is displayed.
>    This report includes:
>      Number of nodes and systems
>      Hop-count information:
>           maximal hop-count, an example path, and a hop-count histogram
>      All CA-to-CA paths traced
>    Note: In case the IB fabric includes only one CA, then CA-to-CA paths are not
>    reported.
>    Furthermore, if a topology file is provided, ibdiagnet uses the names defined
>    in it for the output reports.
> 
> OPTIONS
>    -c <count>    : The minimal number of packets to be sent across each link
>                    (default = 10)
>    -v            : Instructs the tool to run in verbose mode
>    -r            : Provides a report of the fabric qualities
>    -t <topo-file>: Specifies the topology file name
>    -s <sys-name> : Specifies the local system name. Meaningful only if a topology
>                    file is specified
>    -i <dev-index>: Specifies the index of the device of the port used to connect
>                    to the IB fabric (in case of multiple devices on the local
>                    system)
>    -p <port-num> : Specifies the local device's port number used to connect to
>                    the IB fabric
>    -o <out-dir>  : Specifies the directory where the output files will be placed
>                    (default = /tmp/ez)
> 
>    -h|--help     : Prints this help information
>    -V|--version  : Prints the version of the tool
>       --vars     : Prints the tool's environment variables and their values
> 
> ERROR CODES
>    1 - Failed to fully discover the fabric
>    2 - Failed to parse command line options
>    3 - Some packet drop observed
>    4 - Mismatch with provided topology
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general




More information about the general mailing list