[openib-general] Anounce: Advanced Diagnostic Tools
Hal Rosenstock
halr at voltaire.com
Thu Jan 12 07:01:10 PST 2006
On Wed, 2006-01-11 at 17:20, Eitan Zahavi wrote:
> Hi,
>
> With the great help from Danny Zarko and Ariel Libman I was able to upload into
> https://openib.org/svn/gen2/utils/src/linux-user the first of several integrated IB
> diagnostic tools: ibdiagnet (diagnose network).
> The tool depends on ibis, ibdm (available under in the same directory).
> It's main differences from the diag tools available under the trunk are:
Just to set the record straight, some but not all of the below are
supported in the current OpenIB diag tools.
-- Hal
> 1. Performs a complete diagnostic procedure, including:
> * discovery,
> * PM counters check,
> * duplicate LID/GUID
> * ALL to ALL connectivity check (based on LFT data extracted from the fabric)
> * Multicast connectivity and report
> * Credit loop analysis
> * and various other fabric statistics
> 2. If a topology file is provided - all reports are given using system names (rather then
> LID, GUID or directed paths.
>
> ###############################################################################################
> Here are some stdout examples
> ------------------------------
> 1. BAD LIDS
> -E- Device(s) with LID = 0x0000 found in the fabric:
> path="1 1 3 5" H-12/U1 PN=2
> path="1 1 3 4" H-11/U1 PN=1
> path="1 4" H-3/U1 PN=1
>
> 2. DUPLICATED PORT GUIDS
> -E- Devices with identical PortGUID = 0x0002c90000000006 found in the fabric:
> path="1 1" GNU1/main/U2
> path="1 1 5 6" H-9/U1 PN=1
> path="1 1 5 5" H-10/U1 PN=2
>
> 3. BAD LINKS
> -I- Errors have occurred on the following links (for errors details, look in log
> file /tmp/ibmgtsim.31602/ibdiagnet.log):
> Cable: GNU1/M/P7(GNU1/main/U4/P4) =---= H-7/P2(H-7/U1/P2)
> Cable: GNU1/M/P5(GNU1/main/U4/P6) =---= H-5/P2(H-5/U1/P2)
>
> 4. TOPOLOGY MATCH
> -I- Note that "bad" links and the part of the fabric to which they led (in the
> BFS discovery of the fabric, starting at the local node) are not discovered
> and therefore will be reported as "missing".
>
> Missing System:H-7(Cougar)
> Should be connected by cable from port: P2(H-7/U1/P2)
> to:GNU1/M/P7(GNU1/main/U4/P4)
>
> Missing System:H-5(Cougar)
> Should be connected by cable from port: P2(H-5/U1/P2)
> to:GNU1/M/P5(GNU1/main/U4/P6)
>
> 5. MULTICAST ROUTING
> -I- Scanning all multicast groups for loops and connectivity...
> -I- Multicast Group:0xC000 has:2 switches and:2 HCAs
> -E- Extra switch:GNU1/leaf1/U1 in group:0xC000
> -E- Extra switch:GNU1/main/U4 in group:0xC000
> -I- Multicast Group:0xC001 has:4 switches and:4 HCAs
> -E- Extra switch:GNU1/leaf1/U1 in group:0xC001
> -I- Multicast Group:0xC002 has:5 switches and:5 HCAs
> -E- 3 multicast group checks failed
>
> -I---------------------------------------------------
> -I- mgid-mlid-HCAs matching table
> -I---------------------------------------------------
> mgid | mlid | HCAs
> --------------------------------------------------------------------------------
> 0xff12401bffff0000:0x00000000ffffffff | 0xc000 | H-11/U1,H-12/U1
> 0xff12401bffff0000:0x0000000000000001 | 0xc001 | H-15/U1,H-3/U1,H-2/U1,H-7/U1
> 0xff12401bffff0000:0x0000000000000002 | 0xc002 | H-10/U1,H-16/U1,H-4/U1,H-6/U1
>
> 6. UNICAST ROUTING:
> -I- Verifying all CA to CA paths ...
> -E- Unassigned LFT for lid:10 Dead end at:GNU1/main/U1
> -E- Fail to find a path from:H-1/U1/1 to:H-12/U1/2
> -E- Unassigned LFT for lid:18 Dead end at:GNU1/main/U3
> -E- Fail to find a path from:H-1/U1/1 to:H-5/U1/2
> [snip]
> -E- Found 19 missing paths out of:240 paths
>
> 7. CREDIT LOOPS
> -I- Tracing all CA to CA paths for Credit Loops potential ...
> -E- Potential Credit Loop on Path from:H-1/U1/1 to:H-13/U1/1
> Going:Down from:GNU1/main/U1 to:GNU1/main/U3
> Going:Up from:GNU1/main/U3 to:GNU1/main/U1
> Going:Down from:GNU1/main/U1 to:GNU1/leaf1/U1
>
>
> NOTE: All the above cases simulated on top of ibmgtsim.
> Errors injected by simulation flows.
> ######################################################################################
> A full man page:
> ====================
> NAME
> ibdiagnet
>
> SYNOPSYS
> ibdiagnet [-c <count>] [-v] [-r] [-t <topo-file>] [-s <sys-name>]
> [-i <dev-index>] [-p <port-num>] [-o <out-dir>]
>
> DESCRIPTION
> ibdiagnet scans the fabric using directed route packets and extracts all the
> available information regarding its connectivity and devices.
> It then produces the following files in the output directory defined by the
> -o option (see below):
> ibdiagnet.lst - List of all the nodes, ports and links in the fabric
> ibdiagnet.fdbs - A dump of the unicast forwarding tables of the fabric
> switches
> ibdiagnet.mcfdbs - A dump of the multicast forwarding tables of the fabric
> switches
> In addition to generating the files above, the discovery phase also checks for
> duplicate node GUIDs in the IB fabric. If such an error is detected, it is
> displayed on the standard output.
> After the discovery phase is completed, directed route packets are sent
> multiple times (according to the -c option) to detect possible problematic
> paths on which packets may be lost. Such paths are explored, and a report of
> the suspected bad links is displayed on the standard output.
> After scanning the fabric, if the -r option is provided, a full report of the
> fabric qualities is displayed.
> This report includes:
> Number of nodes and systems
> Hop-count information:
> maximal hop-count, an example path, and a hop-count histogram
> All CA-to-CA paths traced
> Note: In case the IB fabric includes only one CA, then CA-to-CA paths are not
> reported.
> Furthermore, if a topology file is provided, ibdiagnet uses the names defined
> in it for the output reports.
>
> OPTIONS
> -c <count> : The minimal number of packets to be sent across each link
> (default = 10)
> -v : Instructs the tool to run in verbose mode
> -r : Provides a report of the fabric qualities
> -t <topo-file>: Specifies the topology file name
> -s <sys-name> : Specifies the local system name. Meaningful only if a topology
> file is specified
> -i <dev-index>: Specifies the index of the device of the port used to connect
> to the IB fabric (in case of multiple devices on the local
> system)
> -p <port-num> : Specifies the local device's port number used to connect to
> the IB fabric
> -o <out-dir> : Specifies the directory where the output files will be placed
> (default = /tmp/ez)
>
> -h|--help : Prints this help information
> -V|--version : Prints the version of the tool
> --vars : Prints the tool's environment variables and their values
>
> ERROR CODES
> 1 - Failed to fully discover the fabric
> 2 - Failed to parse command line options
> 3 - Some packet drop observed
> 4 - Mismatch with provided topology
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
More information about the general
mailing list