[openib-general] ip over ib throughtput
Michael Krause
krause at cup.hp.com
Thu Jan 6 08:55:12 PST 2005
Please keep in mind that:
(a) IEEE rejected jumbo frames as part of the standard. They may be
implemented by many (ASIC, adapter, switch) but that is outside the scope
of the specification.
(b) The IETF has a new draft out that defines a connected mode of IP over
IB which should enable jumbo datagrams as well as regular datagrams to be
used. Connected mode provides the equivalent of TSO.
(c) The objective for IP over IB should be reasonable performance / quality
of implementation while the emphasis for applications should be to focus on
the capabilities inherent in any RDMA capable interconnect.
At 08:27 AM 1/6/2005, Grant Grundler wrote:
>On Thu, Jan 06, 2005 at 01:44:39AM -0700, Stephen Poole wrote:
> > Remember, even Ethernet finally decided to go to Jumbo
> > Frames, why, system impact and more.
>
>I think jumbo frames was proposed because it was easier to implement
>than TCP segmentation offloading. The result is effectively the same
>by reducing the per message overhead.
It did not require any significant network stack changes so the software
impact was minimal.
>Jumbo frames also required the switches support 9K frames and my
>understanding is few do.
It varies by vendors.
>And having a 2G upper limit on the message size seems far in excess of
>where system load would matter. Today, with mass storage, the "sweet spot"
>in transfer size is ~256KB. I.e. bigger sizes don't measurable reduce the
>system overhead. I expect IB to see similar results - possibly with even
>smaller message sizes.
Storage workloads vary by usage model so there isn't a one-size-fits-all
objective. The larger message size was put into place to provide headroom
as one increases link speed and data set sizes increase. However, many
storage workloads make heavy use of random read / write operations as well
as access to meta data. So, depending upon the mix of stream / random as
well as the type of data being accessed, the impact of the message size may
not be relevant. What is relevant is to enable packet based arbitration
within the associated software and hardware. This allows different
messages to be interleaved to the wire without resulting HOL blocking
should a large message be ahead of a series of smaller messages. VL
arbitration was provided to enable simple segregation but isn't sufficient
since there are not enough VL to handle a diverse workload (fine for fairly
simple workloads such as HPC). In any case, while it is not part of the IB
specification, I provided how this should be done in hardware a few years
back now and was informed that some may have implemented this according to
my work at that time.
Mike
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050106/d3911b0f/attachment.html>
More information about the general
mailing list