<h1 align="left">
<img border="0" src="openfabrics.gif" width="107" height="93">
<a name="TOP" href="#TOP"></a> </h1>
<h1 align="center">OpenFabrics Windows </h1>
<h1 align="center">User's Manual</h1>
<h2 align="center">Release 1.0</h2>
<h3 align="center">3-20-07</h3>
<h2 align="left"><u>Overview</u></h2>
<p align="left">This is the OpenFabrics Windows software package
supporting InfiniBand fabrics. It is composed of several software modules
intended for use on a computer cluster constructed as an InfiniBand fabric.</p>
<p align="left">The OpenFabrics Windows (OFW) software package contains the
following:<br>
<br>
OpenFabrics core and ULPs:</p>
<ul>
<li>
<p align="left">HCA drivers (Infinihost)</li>
<li>
<p align="left">Infiniband Core</li>
<li>
<p align="left">Upper Layer Protocols: IPoIB, WSD, VNIC, SRP Initiator and uDAPL</li>
</ul>
<p align="left">OpenFabrics utilities:</p>
<ul>
<li>
<p align="left">OpenSM: InfiniBand Subnet Manager</li>
<li>
<p align="left">Performance tests</li>
<li>
<p align="left">Diagnostic tools</li>
</ul>
<p align="left">Documentation</p>
<ul>
<li>
<p align="left">User's manual</li>
<li>
<p align="left">Release Notes</li>
</ul>
<h2 align="left"><u>Features</u></h2>
<ul>
<li>
<h3 align="left">Tools</h3></li>
</ul>
<blockquote>
<p align="left">The OpenFabrics Alliance Windows release contains a set of
user mode tools which are designed to faciliate the smooth operation of an
OpenFabrics Windows installation.</p>
<h4 align="left">Infiniband Subnet Management</h4>
<ul>
<li>
<p align="left"><a href="#opensm">opensm</a> Open Subnet
Management - configure a subnet</li>
<li>
<p align="left"><a href="#osmtest">osmtest</a>
Subnet management tests</li>
<li>
<p align="left"><a href="#ibtrapgen">ib_trapgen</a> Generate Infiniband Subnet
Management Traps for testing purposes</li>
</ul>
<h4 align="left"><a href="#verbs_benchmarks">Performance</a></h4>
<ul>
<li>
<p align="left"><a href="#ibsend_lat">ib_send_lat</a> Infiniband send
latency measurement</li>
<li>
<p align="left"><a href="#ibsend_bw">ib_send_bw</a> Infiniband send bandwidth
measurement</li>
<li>
<p align="left"><a href="#ibwrite_lat">ib_write_lat</a> Infiniband RDMA write
latency measurement</li>
<li>
<p align="left">i<a href="#ibwrite_bw">b_write_bw</a> Infiniband RDMA write bandwidth
measurement</li>
<li>
<p align="left"><a href="#ttcp">ttcp</a>
TCP performance measurements</li>
</ul>
<h4 align="left"><a href="#diags">Diagnostics</a></h4>
<ul>
<li>
<p align="left"><a href="#iblimits">ib_limits</a>
Infiniband verb tests</li>
<li>
<p align="left"><a href="#cmtest">cmtest </a> Connection Manager tests</li>
<li>
<p align="left"><a href="#printip">PrintIP</a> Display
an Internet Protocol address associated with an IB GUID.</li>
<li>
<p align="left"><a href="#vstat">vstat</a>
Display HCA attributes, statistics and error counters.<br>
</li>
</ul>
</blockquote>
<ul>
<li>
<h3 align="left"><a href="#IPoIB">IPoIB - Internet Protocols over InfiniBand</a></h3>
</li>
<li>
<h3 align="left"><a href="#winsockdirect">Winsock Direct Service Provider</a></h3>
</li>
<li>
<h3 align="left"><a href="#DAT">DAT and uDAPL</a></h3></li>
</ul>
<h3 align="left"> </h3>
<p align="left"> </p>
<hr>
<p align="left"> </p>
<h3 align="left"><a name="verbs_benchmarks"></a>User mode verbs micro-benchmarks<br>
</h3>
<p align="left">These micro-benchmarks tests are intended as a useful benchmark for HW or SW
tuning and/or functional testing.</p>
<blockquote>
<p align="left">- Tests use CPU cycle counter to get time stamps without
context switch.<br>Some CPU architectures do NOT have such capability. e.g. Intel 80486.<br>
<br>- measures round-trip time but reports half of that as one-way latency.<br>ie. May not be sufficiently accurate for
asymmetrical configurations.<br>
<br>- Min/Median/Max result is reported.<br>The median (vs. average) is less sensitive to extreme scores.<br>Typically the "Max" value is the first value measured.<br>
<br>- larger samples only marginally help. The default (1000) is pretty good.<br>Note that an array of cycles_t (typically unsigned long) is allocated<br>once to collect samples and again to store the difference between them.<br>Really big sample sizes (e.g. 1 million) might expose other problems<br>with the program.<br>
<br>- "-H" option will dump the histogram for additional statistical analysis.<br>See xgraph, ygraph, r-base (http://www.r-project.org/), pspp, or other
<br>statistical math programs.<br><br>Architectures tested: x86, x86_64, ia64</p>
</blockquote>
<h4 align="left"><a name="ibsend_lat"></a>ib_send_lat.exe - latency test with
send transactions</h4>
<blockquote>
<p align="left">Usage:</p>
<blockquote>
<p align="left">ib_send_lat start a server and wait for connection<br>
ib_send_lat <host> connect to server at <host></p>
</blockquote>
<p align="left">Options:</p>
<blockquote>
<p align="left">-p, --port=<port> listen on/connect to port <port>
(default 18515)<br>
-c, --connection=<RC/UC> connection type RC/UC (default RC)<br>
-m, --mtu=<mtu> mtu size (default 2048)<br>
-d, --ib-dev=<dev> use IB device <dev> (default first device found)<br>
-i, --ib-port=<port> use port <port> of IB device (default 1)<br>
-s, --size=<size> size of message to exchange (default 1)<br>
-t, --tx-depth=<dep> size of tx queue (default 50)<br>
-l, --signal signal completion on each msg<br>
-a, --all Run sizes from 2 till 2^23<br>
-n, --iters=<iters> number of exchanges (at least 2, default 1000)<br>
-C, --report-cycles report times in cpu cycle units (default
microseconds)<br>
-H, --report-histogram print out all results (default print summary
only)<br>
-U, --report-unsorted (implies -H) print out unsorted results (default
sorted)<br>
-V, --version display version number<br>
-e, --events sleep on CQ events (default poll)</p>
</blockquote>
</blockquote>
<h4 align="left"><a name="ibsend_bw"></a>ib_send_bw.exe - BW (BandWidth) test with send transactions</h4>
<blockquote>
<p align="left">Usage:</p>
<blockquote>
<p align="left">ib_send_bw start a server and wait for connection<br>
ib_send_bw <host> connect to server at <host></p>
</blockquote>
<p align="left">Options:</p>
<blockquote>
<p align="left">-p, --port=<port> listen on/connect to port <port>
(default 18515)<br>
-d, --ib-dev=<dev> use IB device <dev> (default first device found)<br>
-i, --ib-port=<port> use port <port> of IB device (default 1)<br>
-c, --connection=<RC/UC> connection type RC/UC/UD (default RC)<br>
-m, --mtu=<mtu> mtu size (default 1024)<br>
-s, --size=<size> size of message to exchange (default 65536)<br>
-a, --all Run sizes from 2 till 2^23<br>
-t, --tx-depth=<dep> size of tx queue (default 300)<br>
-n, --iters=<iters> number of exchanges (at least 2, default 1000)<br>
-b, --bidirectional measure bidirectional bandwidth (default
unidirectional)<br>
-V, --version display version number<br>
-e, --events sleep on CQ events (default poll)</p>
</blockquote>
</blockquote>
<h4 align="left"><a name="ibwrite_lat"></a>ib_write_lat.exe - latency test with RDMA write
transactions</h4>
<blockquote>
<p align="left">Usage:</p>
<blockquote>
<p align="left">ib_write_lat start a server and wait for connection<br>
ib_write_lat <host> connect to server at <host></p>
</blockquote>
<p align="left">Options:</p>
<blockquote>
<p align="left">-p, --port=<port> listen on/connect to port <port>
(default 18515)<br>
-c, --connection=<RC/UC> connection type RC/UC (default RC)<br>
-m, --mtu=<mtu> mtu size (default 1024)<br>
-d, --ib-dev=<dev> use IB device <dev> (default first device found)<br>
-i, --ib-port=<port> use port <port> of IB device (default 1)<br>
-s, --size=<size> size of message to exchange (default 1)<br>
-a, --all Run sizes from 2 till 2^23<br>
-t, --tx-depth=<dep> size of tx queue (default 50)<br>
-n, --iters=<iters> number of exchanges (at least 2, default 1000)<br>
-C, --report-cycles report times in cpu cycle units (default
microseconds)<br>
-H, --report-histogram print out all results (default print summary
only)<br>
-U, --report-unsorted (implies -H) print out unsorted results (default
sorted)<br>
-V, --version display version number</p>
</blockquote>
</blockquote>
<h4 align="left"><a name="ibwrite_bw"></a>ib_write_bw.exe - BW test with RDMA write transactions</h4>
<blockquote>
<p align="left">Usage:</p>
<blockquote>
<p align="left">ib_write_bw
# start a server and wait for connection<br>
ib_write_bw <host> # connect to server at <host></p>
</blockquote>
<p align="left">Options:</p>
<blockquote>
<p align="left">-p, --port=<port> listen on/connect to port <port>
(default 18515)<br>
-d, --ib-dev=<dev> use IB device <dev> (default first device found)<br>
-i, --ib-port=<port> use port <port> of IB device (default 1)<br>
-c, --connection=<RC/UC> connection type RC/UC (default RC)<br>
-m, --mtu=<mtu> mtu size (default 1024)<br>
-g, --post=<num of posts> number of posts for each qp in the chain
(default tx_depth)<br>
-q, --qp=<num of qp's> Num of qp's(default 1)<br>
-s, --size=<size> size of message to exchange (default 65536)<br>
-a, --all Run sizes from 2 till 2^23<br>
-t, --tx-depth=<dep> size of tx queue (default 100)<br>
-n, --iters=<iters> number of exchanges (at least 2, default 5000)<br>
-b, --bidirectional measure bidirectional bandwidth (default
unidirectional)<br>
-V, --version display version number</p>
</blockquote>
</blockquote>
<h4 align="left"><a href="#TOP"><font color="#000000"><return-to-top></font></a></h4>
<h4 align="left"><a name="ttcp"></a>ttcp - Test TCP performance</h4>
<blockquote>
<pre>Usage: ttcp -t [-options] host
ttcp -r [-options]
Common options:
-l ## length of bufs read from or written to network (default 8192)
-u use UDP instead of TCP
-p ## port number to send to or listen at (default 5001)
-A align the start of buffers to this modulus (default 16384)
-O start buffers at this offset from the modulus (default 0)
-d set SO_DEBUG socket option
-b ## set socket buffer size (if supported)
-f X format for rate: k,K = kilo{bit,byte}; m,M = mega; g,G = giga
Options specific to -t:
-n## number of source bufs written to network (default 2048)
-D don't buffer TCP writes (sets TCP_NODELAY socket option)
Options specific to -r:
-B for -s, only output full blocks as specified by -l (for TAR)
-T "touch": access each byte as it's read</pre>
<p align="left">Requires a receiver (server) side and a transmitter (client)
side, host1 and host2 are IPoIB connected hosts.</p>
<p align="left">at host1 (receiver)
ttcp -r -f M -l 4096</p>
<p align="left">at host2 (transmitter) ttcp -t -f M -l
4096 -n1000 host1</p>
</blockquote>
<h4 align="left"><a href="#TOP"><font color="#000000"><return-to-top></font></a></h4>
<p align="left"> </p>
<h3 align="left"><a name="diags"></a>Diagnostics</h3>
<p align="left"> </p>
<h3 align="left"><a name="iblimits"></a>ib_limits - Infiniband verbs tests</h3>
<p align="left">Usage: ib_limits [options]</p>
<blockquote>
<p align="left">Options:<br>-m or --memory<br> Direct ib_limits to test memory registration<br>-c or --cq<br> Direct ib_limits to test CQ creation<br>-r or --resize_cq<br> direct ib_limits to test CQ resize<br>-q or --qp<br> Directs ib_limits to test QP creation<br>-v or --verbose<br> Enable verbosity level to debug console.<br>-h or --help<br> Display this usage info then exit.</p>
</blockquote>
<h4 align="left"><a href="#TOP"><font color="#000000"><return-to-top></font></a></h4>
<p align="left"> </p>
<h3 align="left"><a name="cmtest"></a>cmtest - Connection Manager Tests</h3>
<p>Usage: cmtest [options]</p>
<p> Options:</p>
<blockquote>
<p> -s --server This option directs cmtest to act as a Server<br>
-l
<lid>--local
<lid>This option specifies the local endpoint.<br>
-r
<lid>--remote
<lid>This option specifies the remote endpoint.<br>
-c
<number>--connect
<number>This option specifies the number of connections to open. Default of
1.<br>
-m
<bytes>--msize
<bytes>This option specifies the byte size of each message. Default is 100
bytes.<br>
-n
<number>--nmsgs
<number>This option specifies the number of messages to send at a time.<br>
-p --permsg This option indicates if a separate buffer should be used per
message. Default is one buffer for all messages.<br>
-i
<number>--iterate
<number>This option specifies the number of times to loop through 'nmsgs'.
Default of 1.<br>
-v --verbose This option enables verbosity level to debug console.<br>
-h --help Display this usage info then exit.</p>
</blockquote>
<h4 align="left"><a href="#TOP"><font color="#000000"><return-to-top></font></a></h4>
<p align="left"> </p>
<p align="left"> </p>
<h3 align="left"><a name="printip"></a>PrintIP - print ip adapters and their addresses</h3>
<blockquote>
<p align="left">PrintIP is used to print IP adapters and their addresses, or
ARP (Address Resolution Protocol) and IP address.<br>
<br>
Usage:<br>
printip <print_ips><br>
printip <remoteip> <ip>
(example printip remoteip 10.10.2.20)</p>
</blockquote>
<h3 align="left"><a href="#TOP"><font color="#000000"><return-to-top></font></a></h3>
<p align="left"> </p>
<h3 align="left">
<br>
<a name="vstat"></a>vstat - HCA Stats and Counters</h3>
<blockquote>
<p align="left">Display HCA (Host channel Adapter) attributes.</p>
<p align="left">Usage: vstat [-v] [-c]<br>
-v - verbose mode<br>
-c - HCA error/statistic
counters</p>
</blockquote>
<h4 align="left"><a href="#TOP"><font color="#000000"><return-to-top></font></a></h4>
<h3 align="left">Install Infiniband Service Provider</h3>
<blockquote>
<p align="left">usage: installsp [-i/-r [-p]]<br>
<br>
-i Install the IB service provider<br>
-r Remove the OpenIB service provider<br>
-r <name> Remove the specified service provider<br>
-l List service providers<br>
</p>
</blockquote>
<h4 align="left"><a href="#TOP" style="text-decoration: none"><font color="#000000"><return-to-top></font></a></h4>
<p align="left"> </p>
<h3 align="left"><a name="opensm"></a>Subnet Management with OpenSM Rev: openib-1.2.0</h3>
<p align="left">A single running process (opensm.exe) is required to configure
and thus make an Infiniband subnet useable. For most cases, invoking
opensm.exe with no arguments is sufficient to correctly configure most
Infiniband fabrics.</p>
<p align="left">The Infiniband subnet management process (opensm) may exist on a
Windows node or a Linux (OFED) node but not both!</p>
<p align="left">Usage: opensm [options]</p>
<p align="left">Options:</p>
<blockquote>
<p align="left">-c<br>
--cache-options</p>
<blockquote>
<p align="left">Cache the given command line options into the file<br>
/var/cache/osm/opensm.opts for use next invocation<br>
The cache directory can be changed by the environment<br>
variable OSM_CACHE_DIR</p>
</blockquote>
<p align="left">-g[=]<GUID in hex><br>
--guid[=]<GUID in hex></p>
<blockquote>
<p align="left">This option specifies the local port GUID value with
which OpenSM should bind. OpenSM may be<br>
bound to 1 port at a time. If GUID given is 0, OpenSM displays a
list of possible port GUIDs and waits for user input. Without -g, OpenSM
trys to use the default port.</p>
</blockquote>
<p align="left">-l <LMC><br>
--lmc <LMC></p>
<blockquote>
<p align="left">This option specifies the subnet's LMC value.<br>
The number of LIDs assigned to each port is 2^LMC.<br>
The LMC value must be in the range 0-7.<br>
LMC values > 0 allow multiple paths between ports.<br>
LMC values > 0 should only be used if the subnet<br>
topology actually provides multiple paths between<br>
ports, i.e. multiple interconnects between switches.<br>
Without -l, OpenSM defaults to LMC = 0, which allows<br>
one path between any two ports.</p>
</blockquote>
<p align="left">-p <PRIORITY><br>
--priority <PRIORITY></p>
<blockquote>
<p align="left">This option specifies the SM's PRIORITY.<br>
This will effect the handover cases, where master<br>
is chosen by priority and GUID.<br>
-smkey <SM_Key><br>
This option specifies the SM's SM_Key (64 bits).<br>
This will effect SM authentication.</p>
</blockquote>
<p align="left">-r<br>
--reassign_lids</p>
<blockquote>
<p align="left"><br>
This option causes OpenSM to reassign LIDs to all end nodes. Specifying
-r on a running subnet<br>
may disrupt subnet traffic. Without -r, OpenSM attempts to
preserve existing LID assignments resolving multiple use of same LID.</p>
</blockquote>
<p align="left">-u<br>
--updn</p>
<blockquote>
<p align="left">This option activate UPDN algorithm instead of Min Hop
algorithm (default).</p>
</blockquote>
<p align="left">-a<br>
--add_guid_file <path to file></p>
<blockquote>
<p align="left">Set the root nodes for the Up/Down routing algorithm to
the guids provided in the given file (one per line)</p>
</blockquote>
<p align="left">-o<br>
--once</p>
<blockquote>
<p align="left">This option causes OpenSM to configure the subnet once,
then exit. Ports remain in the ACTIVE state.</p>
</blockquote>
<p align="left">-s <interval><br>
--sweep <interval></p>
<blockquote>
<p align="left">This option specifies the number of seconds between
subnet sweeps. Specifying -s 0 disables sweeping.<br>
Without -s, OpenSM defaults to a sweep interval of 10 seconds.</p>
</blockquote>
<p align="left">-t <milliseconds><br>
--timeout <milliseconds></p>
<blockquote>
<p align="left">This option specifies the time in milliseconds<br>
used for transaction timeouts.<br>
Specifying -t 0 disables timeouts.<br>
Without -t, OpenSM defaults to a timeout value of<br>
200 milliseconds.</p>
</blockquote>
<p align="left">-maxsmps <number></p>
<blockquote>
<p align="left">This option specifies the number of VL15 SMP MADs
allowed on the wire at any one time.<br>
Specifying -maxsmps 0 allows unlimited outstanding SMPs.<br>
Without -maxsmps, OpenSM defaults to a maximum of one outstanding SMP.</p>
</blockquote>
<p align="left">-i <equalize-ignore-guids-file><br>
-ignore-guids <equalize-ignore-guids-file></p>
<blockquote>
<p align="left">This option provides the means to define a set of ports
(by guids) that will be ignored by the link load
equalization algorithm.</p>
</blockquote>
<p align="left">-x<br>
--honor_guid2lid</p>
<blockquote>
<p align="left">This option forces OpenSM to honor the guid2lid file,
when it comes out of Standby state, if such file exists
under OSM_CACHE_DIR, and is valid.
By default this is FALSE.</p>
</blockquote>
<p align="left">-f<br>
--log_file</p>
<blockquote>
<p align="left">This option defines the log to be the given file. By
default the log goes to %SystemRoot%\temp\osm.log.<br>
For the log to go to standard output use -f stdout.</p>
</blockquote>
<p align="left">-e<br>
--erase_log_file</p>
<blockquote>
<p align="left">This option will cause deletion of the log file
(if it previously exists). By default, the log file is accumulative.</p>
</blockquote>
<p align="left">-y<br>
--stay_on_fatal</p>
<blockquote>
<p align="left">This option will cause SM not to exit on fatal
initialization
issues: if SM discovers duplicated guids or 12x link with
lane reversal badly configured.
By default, the SM will exit on these errors.</p>
</blockquote>
<p align="left">-v<br>
--verbose</p>
<blockquote>
<p align="left">This option increases the log verbosity level.
The -v option may be specified multiple times
to further increase the verbosity level.
See the -vf option for more information about.
log verbosity.</p>
</blockquote>
<p align="left">-V</p>
<blockquote>
<p align="left">This option sets the maximum verbosity level and
forces log flushing.<br>
The -V is equivalent to '-vf 0xFF -d 2'.
See the -vf option for more information about
log verbosity.</p>
</blockquote>
<p align="left">-D <flags></p>
<blockquote>
<p align="left">This option sets the log verbosity level. A flags
field must follow the -D option.<br>
A bit set/clear in the flags enables/disables a specific log level as
follows:<br>
BIT LOG LEVEL ENABLED<br>
---- -----------------<br>
0x01 - ERROR (error messages)<br>
0x02 - INFO (basic messages, low volume)<br>
0x04 - VERBOSE (interesting stuff, moderate volume)<br>
0x08 - DEBUG (diagnostic, high volume)<br>
0x10 - FUNCS (function entry/exit, very high volume)<br>
0x20 - FRAMES (dumps all SMP and GMP frames)<br>
0x40 - ROUTING (dump FDB routing information)<br>
0x80 - currently unused.<br>
Without -D, OpenSM defaults to ERROR + INFO (0x3).<br>
Specifying -D 0 disables all messages.<br>
Specifying -D 0xFF enables all messages (see -V).<br>
High verbosity levels may require increasing the transaction timeout
with the -t option.</p>
</blockquote>
<p align="left">-d <number><br>
--debug <number></p>
<blockquote>
<p align="left">This option specifies a debug option. These options are
not normally needed. The number following -d selects the debug option to
enable as follows:<br>
OPT Description<br>
--- -----------------<br>
-d0 - Ignore other SM nodes<br>
-d1 - Force single threaded dispatching<br>
-d2 - Force log flushing after each log message<br>
-d3 - Disable multicast support<br>
-d4 - Put OpenSM in memory tracking mode<br>
-d10 - Put OpenSM in testability mode<br>
Without -d, no debug options are enabled</p>
</blockquote>
<p align="left">-h<br>
--help</p>
<blockquote>
<p align="left">Display this usage info then exit.</p>
</blockquote>
<p align="left">-?</p>
<blockquote>
<p align="left">Display this usage info then exit.</p>
</blockquote>
</blockquote>
<h4 align="left"><a href="#TOP"><font color="#000000"><return-to-top></font></a></h4>
<h3 align="left"> </h3>
<h3 align="left"><a name="osmtest"></a>Osmtest - Subnet Management Tests</h3>
<p align="left">Invoke open subnet management tests.</p>
<blockquote>
<p align="left"> Usage: osmtest [options]</p>
<p align="left">Options:</p>
<blockquote>
<p align="left"> -f <c|a|v|s|e|f|m|q|t><br>
--flow <c|a|v|s|e|f|m|q|t></p>
<p align="left">This option directs osmtest to run a specific flow:</p>
<p align="left">FLOW DESCRIPTIONS<br>
c = create an inventory file with all nodes, ports & paths.<br>
a = run all validation tests (expecting an input inventory)<br>
v = only validate the given inventory file.<br>
s = run service registration, un-registration and lease.<br>
e = run event forwarding test.<br>
f = flood the SA with queries accoring to the stress mode.<br>
m = multicast flow.<br>
q = QoS info - VLArb and SLtoVL tables.<br>
t = run trap 64/65 flow; requires running an external tool.<br>
(default is all but QoS).</p>
<p align="left">-w <trap_wait_time><br>
--wait <trap_wait_time></p>
<blockquote>
<p align="left">This option specifies the wait time for trap 64/65
in seconds.<br>
It is used only when running -f t - the trap 64/65 flow<br>
(default to 10 sec).</p>
</blockquote>
<p align="left">-d <number><br>
--debug <number></p>
<blockquote>
<p align="left">This option specifies a debug option.
These options are not normally needed.<br>
The number following -d selects the debug
option to enable as follows:<br>
OPT Description<br>
--- -----------------<br>
-d0 - Unused.<br>
-d1 - Do not scan/compare path records.<br>
-d2 - Force log flushing after each log message.<br>
-d3 - Use mem tracking.<br>
Without -d, no debug options are enabled.</p>
</blockquote>
<p align="left">-m <LID in hex><br>
--max_lid <LID in hex></p>
<blockquote>
<p align="left">This option specifies the maximal LID number to be
searched
for during inventory file build (default to 100).</p>
</blockquote>
<p align="left">-g <GUID in hex><br>
--guid <GUID in hex></p>
<blockquote>
<p align="left">This option specifies the local port GUID value
with which osmtest should bind. osmtest may be
bound to 1 port at a time.
Without -g, osmtest displays a menu of possible
port GUIDs and waits for user input.</p>
</blockquote>
<p align="left">-h<br>
--help</p>
<blockquote>
<p align="left">Display this usage info then exit.</p>
</blockquote>
<p align="left">-i <filename><br>
--inventory <filename></p>
<blockquote>
<p align="left">This option specifies the name of the inventory
file.
Normally, osmtest expects to find an inventory file,
which osmtest uses to validate real-time information
received from the SA during testing.
If -i is not specified, osmtest defaults to the file
'osmtest.dat'.<br>
See the -c option for related information.</p>
</blockquote>
<p align="left">-s<br>
--stress</p>
<blockquote>
<p align="left">This option runs the specified stress test instead
of the normal test suite.<br>
Stress test options are as follows:<br>
OPT Description<br>
--- -----------------<br>
-s1 - Single-MAD response SA queries .<br>
-s2 - Multi-MAD (RMPP) response SA queries.<br>
-s3 - Multi-MAD (RMPP) Path Record SA queries.<br>
Without -s, stress testing is not performed.</p>
</blockquote>
<p align="left">-M<br>
--Multicast_Mode</p>
<blockquote>
<p align="left">This option specify length of Multicast test :<br>
OPT Description<br>
--- -----------------<br>
-M1 - Short Multicast Flow (default) - single mode.<br>
-M2 - Short Multicast Flow - multiple mode.<br>
-M3 - Long Multicast Flow - single mode.<br>
-M4 - Long Multicast Flow - multiple mode.<br>
Single mode - Osmtest is tested alone , with no other <br>
apps that interact vs. OpenSM MC.<br>
Multiple mode - Could be run with other apps using MC vs.<br>
OpenSM. Without -M, default flow testing is performed.</p>
</blockquote>
<p align="left">-t <milliseconds></p>
<blockquote>
<p align="left">This option specifies the time in milliseconds used
for transaction timeouts.<br>
Specifying -t 0 disables timeouts.<br>
Without -t, osmtest defaults to a timeout value of 1 second.</p>
</blockquote>
<p align="left">-l<br>
--log_file</p>
<blockquote>
<p align="left">This option defines the log to be the given file.<br>
By default the log goes to stdout.</p>
</blockquote>
<p align="left">-v</p>
<blockquote>
<p align="left">This option increases the log verbosity level. The
-v option may be specified multiple times<br>
to further increase the verbosity level. See the -vf option for more
information about log verbosity.</p>
</blockquote>
<p align="left">-V</p>
<blockquote>
<p align="left">This option sets the maximum verbosity level and
forces log flushing.<br>
The -V is equivalent to '-vf 0xFF -d 2'.<br>
See the -vf option for more information about log verbosity.</p>
</blockquote>
<p align="left">-vf <flags></p>
<blockquote>
<p align="left">This option sets the log verbosity level. A flags
field must follow the -vf option.<br>
A bit set/clear in the flags enables/disables a specific log level
as follows:<br>
BIT LOG LEVEL ENABLED<br>
---- -----------------<br>
0x01 - ERROR (error messages)<br>
0x02 - INFO (basic messages, low volume)<br>
0x04 - VERBOSE (interesting stuff, moderate volume)<br>
0x08 - DEBUG (diagnostic, high volume)<br>
0x10 - FUNCS (function entry/exit, very high volume)<br>
0x20 - FRAMES (dumps all SMP and GMP frames)<br>
0x40 - currently unused.<br>
0x80 - currently unused.<br>
Without -vf, osmtest defaults to ERROR + INFO (0x3).<br>
Specifying -vf 0 disables all messages.<br>
Specifying -vf 0xFF enables all messages (see -V).<br>
High verbosity levels may require increasing<br>
the transaction timeout with the -t option.</p>
</blockquote>
</blockquote>
</blockquote>
<h4 align="left"><a href="#TOP"><font color="#000000"><return-to-top></font></a></h4>
<h3 align="left"> </h3>
<h3 align="left"><br>
<a name="ibtrapgen"></a>ibtrapgen - Generate Infiniband subnet management traps</h3>
<p align="left">Usage: ibtrapgen -t|--trap_num <TRAP_NUM> -n|--number <NUM_TRAP_CREATIONS><br>
-r|--rate <TRAP_RATE> -l|--lid <LIDADDR> <br>
-s|--src_port <SOURCE_PORT> -p|--port_num <PORT_NUM><br>
<br>
Options: one of the following optional flows:</p>
<blockquote>
<p align="left">-t <TRAP_NUM><br>
--trap_num <TRAP_NUM><br>
This option specifies the
number of the trap to generate. Valid values are 128-131.<br>
-n <NUM_TRAP_CREATIONS><br>
--number <NUM_TRAP_CREATIONS><br>
This option specifies the
number of times to generate this trap.<br>
If not specified -
default to 1.<br>
-r <TRAP_RATE><br>
--rate <TRAP_RATE><br>
This option specifies the
rate of the trap generation.<br>
What is the time period
between one generation and another?<br>
The value is given in
miliseconds. <br>
If the number of trap
creations is 1 - this value is ignored.<br>
-l <LIDADDR><br>
--lid <LIDADDR><br>
This option specifies the
lid address from where the trap should be generated.<br>
-s <SOURCE_PORT><br>
--src_port <SOURCE_PORT><br>
This option specifies the
port number from which the trap should<br>
be generated. If trap
number is 128 - this value is ignored (since<br>
trap 128 is not sent with
a specific port number)<br>
-p <port num><br>
--port_num <port num><br>
This is the port number
used for communicating with the SA.<br>
-h<br>
--help<br>
Display this usage info
then exit.<br>
-o<br>
--out_log_file<br>
This option defines the
log to be the given file.<br>
By default the log goes
to stdout.<br>
-v<br>
This option increases the
log verbosity level.<br>
The -v option may be
specified multiple times to further increase the verbosity level.<br>
See the -vf option for
more information about log verbosity.<br>
-V<br>
This option sets the
maximum verbosity level and forces log flushing.<br>
The -V is equivalent to
'-vf 0xFF -d 2'.<br>
See the -vf option for
more information about. log verbosity.<br>
-x <flags><br>
This option sets the log
verbosity level.<br>
A flags field must follow
the -vf option.<br>
A bit set/clear in the
flags enables/disables a<br>
specific log level as
follows:</p>
<blockquote>
<p align="left">BIT LOG LEVEL ENABLED<br>
---- -----------------<br>
0x01 - ERROR (error messages)<br>
0x02 - INFO (basic messages, low volume)<br>
0x04 - VERBOSE (interesting stuff, moderate volume)<br>
0x08 - DEBUG (diagnostic, high volume)<br>
0x10 - FUNCS (function entry/exit, very high volume)<br>
0x20 - FRAMES (dumps all SMP and GMP frames)<br>
0x40 - currently unused.<br>
0x80 - currently unused.<br>
Without -x, ibtrapgen defaults to ERROR + INFO (0x3).<br>
Specifying -x 0 disables all messages.<br>
Specifying -x 0xFF enables all messages (see -V).</p>
</blockquote>
</blockquote>
<h4 align="left"><a href="#TOP"><font color="#000000"><return-to-top></font></a></h4>
<p align="left"> </p>
<p align="left"> </p>
<h3 align="left"><a name="IPoIB"></a>IPoIB - Internet Protocols over InfiniBand</h3>
<p align="left">IPoIB enables the use of Internet Protocol utilities (e.g., ftp,
telnet) to function correctly over an Infiniband fabric. IPoIB is implemented as
an NDIS Miniport driver with a WDM lower edge.</p>
<p align="left">The IPoIB Network adapters are
located via 'My Computer->Manage->Device Manager->Network adapters->IPoIB'.<br>
'My
Network Places->Properties' will display IPoIB Local Area Connection instances and should be used to
configure IP addresses for the IPoIB interfaces; one Local Area Connection
instance per HCA port. The IP
(Internet Protocol) address bound to the IPoIB adapter instance can be assigned
by DHCP or as a static IP addresses via<br>
'My Network Places->Properties->Local
Area Connection X->Properties->(General Tab)Internet Protocol(TCP/IP)->Properties'.</p>
<p align="left">When the subnet manager (opensm) configures/sweeps the local
Infiniband HCA, the Local Area Connection will become enabled. If you discover
the Local Area Connection to be disabled, then likely your subnet manager
(opensm) is not running or functioning correctly.</p>
<h4 align="left"><a href="#TOP"><font color="#000000"><return-to-top></font></a></h4>
<p align="left"> </p>
<p align="left"> </p>
<h3 align="left"><a name="winsockdirect"></a>Winsock Direct Service Provider</h3>
<p align="left">Winsock Direct (WSD) is Microsoft's proprietary protocol that
predates SDP (Sockets Direct Protocol) for accelerating TCP/IP applications by
using RDMA hardware. Microsoft had a significant role in defining the SDP
protocol, hence SDP and WSD are remarkably similar, though unfortunately
incompatible.<br>
<br>
WSD is made up of two parts, the winsock direct switch and the winsock direct
provider. The WSD switch is in the winsock DLL that ships in all editions of
Windows Server 2003, and is responsible for routing socket traffic over either
the regular TCP/IP stack, or offload it to a WSD provider. The WSD provider is a
hardware specific DLL that implements connection management and data transfers
over particular RDMA hardware.</p>
<p align="left">The WSD Protocol seamlessly transports TCP
data using Infiniband data packets in 'buffered' mode or Infiniband
RDMA in 'direct' mode. Either way the user mode socket application sees no
difference in the standard Inet socket which it created other than
reduced data transfer times and increased bandwidth.<br>
<br>
The OpenFabrics Windows release includes a WSD provider library that has been
extensively tested with Microsoft Windows Server 2003.</p>
<div id="tiki-main">
<div id="tiki-mid">
<table id="table1" cellSpacing="0" cellPadding="0" border="0">
<tr>
<td id="centercolumn">
<div id="tiki-center">
<div class="wikitext">
Environment variables can be used to change the behavior
of the WSD provider:<br>
<br>
IBWSD_NO_READ - Disables RDMA Read operations when set
to any value. Note that this variable must be used
consistently throughout the cluster or communication
will fail.<br>
<br>
IBWSD_POLL - Sets the number of times to poll the
completion queue after processing completions in
response to a CQ event. Reduces latency at the cost of
CPU utilization. Default is 500.<br>
<br>
IBWSD_SA_RETRY - Sets the number of times to retry SA
query requests. Default is 4, can be increased if
connection establishment fails.<br>
<br>
IBWSD_SA_TIMEOUT - Sets the number of milliseconds to
wait before retrying SA query requests. Default is 4,
can be increased if connection establishment fails.<br>
<br>
IBWSD_NO_IPOIB - SA query timeouts by default allow the
connection to be established over IPoIB. Setting this
environment variable to any value prevents fall back to
IPoIB if SA queries time out.<br>
<br>
IBWSD_DBG - Controls debug output when using a debug
version of the WSD provider. Takes a hex value, with
leading '0x', default value is '0x80000000'<br>
<br>
<table class="wikitable" id="table2">
<tr>
<td class="wikicell">0x00000001</td>
<td class="wikicell">DLL</td>
</tr>
<tr>
<td class="wikicell">0x00000002</td>
<td class="wikicell">socket info</td>
</tr>
<tr>
<td class="wikicell">0x00000004</td>
<td class="wikicell">initialization code</td>
</tr>
<tr>
<td class="wikicell">0x00000008</td>
<td class="wikicell">WQ related functions</td>
</tr>
<tr>
<td class="wikicell">0x00000010</td>
<td class="wikicell">Enpoints related functions</td>
</tr>
<tr>
<td class="wikicell">0x00000020</td>
<td class="wikicell">memory registration</td>
</tr>
<tr>
<td class="wikicell">0x00000040</td>
<td class="wikicell">CM</td>
</tr>
<tr>
<td class="wikicell">0x00000080</td>
<td class="wikicell">connections</td>
</tr>
<tr>
<td class="wikicell">0x00000200</td>
<td class="wikicell">socket options</td>
</tr>
<tr>
<td class="wikicell">0x00000400</td>
<td class="wikicell">network events</td>
</tr>
<tr>
<td class="wikicell">0x00000800</td>
<td class="wikicell">Hardware</td>
</tr>
<tr>
<td class="wikicell">0x00001000</td>
<td class="wikicell">Overlapped I/O request</td>
</tr>
<tr>
<td class="wikicell">0x00002000</td>
<td class="wikicell">Socket Duplication</td>
</tr>
<tr>
<td class="wikicell">0x00004000</td>
<td class="wikicell">Performance Monitoring</td>
</tr>
<tr>
<td class="wikicell">0x01000000</td>
<td class="wikicell">More verbose than
IBSP_DBG_LEVEL3</td>
</tr>
<tr>
<td class="wikicell">0x02000000</td>
<td class="wikicell">More verbose than
IBSP_DBG_LEVEL2</td>
</tr>
<tr>
<td class="wikicell">0x04000000</td>
<td class="wikicell">More verbose than
IBSP_DBG_LEVEL1</td>
</tr>
<tr>
<td class="wikicell">0x08000000</td>
<td class="wikicell">Verbose output</td>
</tr>
<tr>
<td class="wikicell">0x20000000</td>
<td class="wikicell">Function enter/exit</td>
</tr>
<tr>
<td class="wikicell">0x40000000</td>
<td class="wikicell">Warnings</td>
</tr>
<tr>
<td class="wikicell">0x80000000</td>
<td class="wikicell">Errors</td>
</tr>
</table>
</div>
</div>
</td>
</tr>
</table>
</div>
</div>
<p align="left"><br>
See <a href="https://wiki.openfabrics.org/tiki-index.php?page=Winsock+Direct">
https://wiki.openfabrics.org/tiki-index.php?page=Winsock+Direct</a> for the
latest WSD status.</p>
<p align="left"> </p>
<p align="left">installsp.exe - Installs the Winsock direct service provider
for Infiniband.</p>
<h4 align="left"><a href="#TOP"><font color="#000000"><return-to-top></font></a></h4>
<p align="left"> </p>
<p align="left"> </p>
<h3 align="left"><a name="DAT"></a>DAT (Direct Access Transport) Library and uDAPL (usermode Direct Access Programing
Library)</h3>
<p align="left">DAT and uDAPL are based on the 1.1 DAT specification. The DAPL
(Direct Access Provider Library) provider now fully supports Infiniband RDMA and
IPoIB.</p>
<div align="left">
<pre><font face="Courier New" size="2">EXECUTION ENVIRONMENT:</font></pre>
</div>
<blockquote>
<p align="left">In order for DAT/uDAPL programs to execute correctly, the
dat.dll file must be present in the current directory, windows\system32\ or
in the library search path.</p>
</blockquote>
<pre>NAME
dapltest - test for the Direct Access Provider Library (DAPL)
DESCRIPTION
Dapltest is a set of tests developed to exercise, characterize,
and verify the DAPL interfaces during development and porting.
At least two instantiations of the test must be run. One acts
as the server, fielding requests and spawning server-side test
threads as needed. Other client invocations connect to the
server and issue test requests.
The server side of the test, once invoked, listens continuously
for client connection requests, until quit or killed. Upon
receipt of a connection request, the connection is established,
the server and client sides swap version numbers to verify that
they are able to communicate, and the client sends the test
request to the server. If the version numbers match, and the
test request is well-formed, the server spawns the threads
needed to run the test before awaiting further connections.
USAGE
dapltest [ -f script_file_name ]
[ -T S|Q|T|P|L ] [ -D device_name ] [ -d ] [ -R HT|LL|EC|PM|BE ]
With no arguments, dapltest runs as a server using default values,
and loops accepting requests from clients. The -f option allows
all arguments to be placed in a file, to ease test automation.
The following arguments are common to all tests:
[ -T S|Q|T|P|L ] Test function to be performed:
S - server loop
Q - quit, client requests that server
wait for any outstanding tests to
complete, then clean up and exit
T - transaction test, transfers data between
client and server
P - performance test, times DTO operations
L - limit test, exhausts various resources,
runs in client w/o server interaction
Default: S
[ -D device_name ] Specifies the name of the device (interface adapter).
Default: host-specific, look for DT_MdepDeviceName
in dapl_mdep.h
[ -d ] Enables extra debug verbosity, primarily tracing
of the various DAPL operations as they progress.
Repeating this parameter increases debug spew.
Errors encountered result in the test spewing some
explanatory text and stopping; this flag provides
more detail about what lead up to the error.
Default: zero
[ -R BE ] Indicate the quality of service (QoS) desired.
Choices are:
HT - high throughput
LL - low latency
EC - economy (neither HT nor LL)
PM - premium
BE - best effort
Default: BE
USAGE - Quit test client
dapltest [Common_Args] [ -s server_name ]
Quit testing (-T Q) connects to the server to ask it to clean up and
exit (after it waits for any outstanding test runs to complete).
In addition to being more polite than simply killing the server,
this test exercises the DAPL object teardown code paths.
There is only one argument other than those supported by all tests:
-s server_name Specifies the name of the server interface.
No default.
USAGE - Transaction test client
dapltest [Common_Args] [ -s server_name ]
[ -t threads ] [ -w endpoints ] [ -i iterations ] [ -Q ]
[ -V ] [ -P ] OPclient OPserver [ op3,
Transaction testing (-T T) transfers a variable amount of data between
client and server. The data transfer can be described as a sequence of
individual operations; that entire sequence is transferred 'iterations'
times by each thread over all of its endpoint(s).
The following parameters determine the behavior of the transaction test:
-s server_name Specifies the hostname of the dapltest server.
No default.
[ -t threads ] Specify the number of threads to be used.
Default: 1
[ -w endpoints ] Specify the number of connected endpoints per thread.
Default: 1
[ -i iterations ] Specify the number of times the entire sequence
of data transfers will be made over each endpoint.
Default: 1000
[ -Q ] Funnel completion events into a CNO.
Default: use EVDs
[ -V ] Validate the data being transferred.
Default: ignore the data
[ -P ] Turn on DTO completion polling
Default: off
OP1 OP2 [ OP3, ... ]
A single transaction (OPx) consists of:
server|client Indicates who initiates the
data transfer.
SR|RR|RW Indicates the type of transfer:
SR send/recv
RR RDMA read
RW RDMA write
Defaults: none
[ seg_size [ num_segs ] ]
Indicates the amount and format
of the data to be transferred.
Default: 4096 1
(i.e., 1 4KB buffer)
[ -f ] For SR transfers only, indicates
that a client's send transfer
completion should be reaped when
the next recv completion is reaped.
Sends and receives must be paired
(one client, one server, and in that
order) for this option to be used.
Restrictions:
Due to the flow control algorithm used by the transaction test, there
must be at least one SR OP for both the client and the server.
Requesting data validation (-V) causes the test to automatically append
three OPs to those specified. These additional operations provide
synchronization points during each iteration, at which all user-specified
transaction buffers are checked. These three appended operations satisfy
the "one SR in each direction" requirement.
The transaction OP list is printed out if -d is supplied.
USAGE - Performance test client
dapltest [Common_Args] -s server_name [ -m p|b ]
[ -i iterations ] [ -p pipeline ] OP
Performance testing (-T P) times the transfer of an operation.
The operation is posted 'iterations' times.
The following parameters determine the behavior of the transaction test:
-s server_name Specifies the hostname of the dapltest server.
No default.
-m b|p Used to choose either blocking (b) or polling (p)
Default: blocking (b)
[ -i iterations ] Specify the number of times the entire sequence
of data transfers will be made over each endpoint.
Default: 1000
[ -p pipeline ] Specify the pipline length, valid arguments are in
the range [0,MAX_SEND_DTOS]. If a value greater than
MAX_SEND_DTOS is requested the value will be
adjusted down to MAX_SEND_DTOS.
Default: MAX_SEND_DTOS
OP
An operation consists of:
RR|RW Indicates the type of transfer:
RR RDMA read
RW RDMA write
Default: none
[ seg_size [ num_segs ] ]
Indicates the amount and format
of the data to be transferred.
Default: 4096 1
(i.e., 1 4KB buffer)
USAGE - Limit test client
Limit testing (-T L) neither requires nor connects to any server
instance. The client runs one or more tests which attempt to
exhaust various resources to determine DAPL limits and exercise
DAPL error paths. If no arguments are given, all tests are run.
Limit testing creates the sequence of DAT objects needed to
move data back and forth, attempting to find the limits supported
for the DAPL object requested. For example, if the LMR creation
limit is being examined, the test will create a set of
{IA, PZ, CNO, EVD, EP} before trying to run dat_lmr_create() to
failure using that set of DAPL objects. The 'width' parameter
can be used to control how many of these parallel DAPL object
sets are created before beating upon the requested constructor.
Use of -m limits the number of dat_*_create() calls that will
be attempted, which can be helpful if the DAPL in use supports
essentailly unlimited numbers of some objects.
The limit test arguments are:
[ -m maximum ] Specify the maximum number of dapl_*_create()
attempts.
Default: run to object creation failure
[ -w width ] Specify the number of DAPL object sets to
create while initializing.
Default: 1
[ limit_ia ] Attempt to exhaust dat_ia_open()
[ limit_pz ] Attempt to exhaust dat_pz_create()
[ limit_cno ] Attempt to exhaust dat_cno_create()
[ limit_evd ] Attempt to exhaust dat_evd_create()
[ limit_ep ] Attempt to exhaust dat_ep_create()
[ limit_rsp ] Attempt to exhaust dat_rsp_create()
[ limit_psp ] Attempt to exhaust dat_psp_create()
[ limit_lmr ] Attempt to exhaust dat_lmr_create(4KB)
[ limit_rpost ] Attempt to exhaust dat_ep_post_recv(4KB)
[ limit_size_lmr ] Probe maximum size dat_lmr_create()
Default: run all tests
EXAMPLES
dapltest -T S -d -D ibnic0
Starts a server process with debug verbosity.
dapltest -T T -d -s winIB -D ibnic0 -i 100 \
client SR 4096 2 server SR 4096 2
Runs a transaction test, with both sides
sending one buffer with two 4KB segments,
one hundred times; dapltest server is on host winIB.
dapltest -T P -d -s winIB -D JniIbdd0 -i 100 SR 4096 2
Runs a performance test, with the client
sending one buffer with two 4KB segments,
one hundred times.
dapltest -T Q -s winIB -D ibnic0
Asks the dapltest server at host 'winIB' to clean up and exit.
dapltest -T L -D ibnic0 -d -w 16 -m 1000
Runs all of the limit tests, setting up
16 complete sets of DAPL objects, and
creating at most a thousand instances
when trying to exhaust resources.
dapltest -T T -V -d -t 2 -w 4 -i 55555 -s winIB -D ibnic0 \
client RW 4096 1 server RW 2048 4 \
client SR 1024 4 server SR 4096 2 \
client SR 1024 3 -f server SR 2048 1 -f
Runs a more complicated transaction test,
with two thread using four EPs each,
sending a more complicated buffer pattern
for a larger number of iterations,
validating the data received.
BUGS (and To Do List)
Use of CNOs (-Q) is not yet supported.
Further limit tests could be added.</pre>
<h4 align="left"><a href="#TOP"><font color="#000000"><return-to-top></font></a></h4>