[ofa-general] Manipulating Credits in Infiniband
Nifty Tom Mitchell
niftyompi at niftyegg.com
Tue Aug 11 19:37:59 PDT 2009
On Mon, Aug 10, 2009 at 12:11:22PM -0400, Ashwath Narasimhan wrote:
>
> I looked into the infiniband driver files. As I understand, in order to
> limit the data rate we manipulate the credits on either ends. Since the
> number of credits available depends on the receiver's work receive
> queue size, I decided to limit the queue size to say 5 instead of 8192
> (reference---> ipoib.h, IPOIB_MAX_QUEUE_SIZE to say 3 since my higher
> layer protocol is ipoib). I just want to confirm if I am doing the
> right thing?
Data rate is not manipulated by credits.
Credits and queue sizes are different and have different purposes.
Visit the Infiniband Trade Association web site and grab the IB
specifications to understand some of the hardware level parts.
http://www.infinibandta.org/
InfiniBand offers credit based flow control and given the nature of
modern IB switches and processors a very small credit count can still
result in full data rate. Having said that flow control is the lowest
level throttle in the system. Reducing the credit count forces the
higher levels in the protocol stack to source or sink the data through
the hardware before any more can be delivered. Thus flow control can
simplify the implementation of higher level protocols. It can also be used
to cost reduce or simplify hardware design (smaller hardware buffers).
The IB specifications are way too long. Start with this FAQ.
http://www.mellanox.com/pdf/whitepapers/InfiniBandFAQ_FQ_100.pdf
The IB specification is way too full of optional features. A vendor may
have XYZ working fine and dandy on one card and since it is optional not
at all on another.
The various queue sizes for the various protocols built on top of
IB establish transfer behavior in keeping with system interrupt,
system process time slice, system kernel activity loads and needs.
It is counter intuitive but in some cases small queues result in
more responsive and agile systems, especially in the presence of errors.
Since there are often multiple protocols on the IB stack all protocols
will be impacted by credit tinkering. Most vendors know their hardware
so most drivers will have credit related code optimum.
In the case of TCP/IP the interaction between IB bandwidths&MTU (IPoIB),
ethernet bandwidth&MTU and even localhost (127.0.0.1) bandwidth&MTU can
be "interesting" depending on host names, subnets, routing etc. TCP/IP
has lots of tuning flags well above the IB driver. I see 500+ net.*
sysctl knobs on this system.
As you change things do make the changes on all the moving parts, benchmark
and keep a log. Since there are multiple IB hardware vendors
it is important to track hardware specifics. "lspci" is a good tool
to gather chip info. With some cards you also need specifics about
the active firmware.
So go forth (RPN forever) and conquer.
--
T o m M i t c h e l l
Found me a new hat, now what?
More information about the general
mailing list