[openib-general] ip over ib throughtput

Thu Jan 6 08:55:12 PST 2005

Please keep in mind that:

(a) IEEE rejected jumbo frames as part of the standard.  They may be 
implemented by many (ASIC, adapter, switch) but that is outside the scope 
of the specification.

(b) The IETF has a new draft out that defines a connected mode of IP over 
IB which should enable jumbo datagrams as well as regular datagrams to be 
used.  Connected mode provides the equivalent of TSO.

(c) The objective for IP over IB should be reasonable performance / quality 
of implementation while the emphasis for applications should be to focus on 
the capabilities inherent in any RDMA capable interconnect.

At 08:27 AM 1/6/2005, Grant Grundler wrote:
>On Thu, Jan 06, 2005 at 01:44:39AM -0700, Stephen Poole wrote:
> > Remember, even Ethernet finally decided to go to Jumbo
> > Frames, why, system impact and more.
>
>I think jumbo frames was proposed because it was easier to implement
>than TCP segmentation offloading. The result is effectively the same
>by reducing the per message overhead.

It did not require any significant network stack changes so the software 
impact was minimal.

>Jumbo frames also required the switches support 9K frames and my 
>understanding is few do.

It varies by vendors.

>And having a 2G upper limit on the message size seems far in excess of 
>where system load would matter. Today, with mass storage, the "sweet spot" 
>in transfer size is ~256KB. I.e. bigger sizes don't measurable reduce the 
>system overhead. I expect IB to see similar results - possibly with even 
>smaller message sizes.

Storage workloads vary by usage model so there isn't a one-size-fits-all 
objective.  The larger message size was put into place to provide headroom 
as one increases link speed and data set sizes increase.  However, many 
storage workloads make heavy use of random read / write operations as well 
as access to meta data.  So, depending upon the mix of stream / random as 
well as the type of data being accessed, the impact of the message size may 
not be relevant.  What is relevant is to enable packet based arbitration 
within the associated software and hardware.  This allows different 
messages to be interleaved to the wire without resulting HOL blocking 
should a large message be ahead of a series of smaller messages.  VL 
arbitration was provided to enable simple segregation but isn't sufficient 
since there are not enough VL to handle a diverse workload (fine for fairly 
simple workloads such as HPC).  In any case, while it is not part of the IB 
specification, I provided how this should be done in hardware a few years 
back now and was informed that some may have implemented this according to 
my work at that time.

Mike 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050106/d3911b0f/attachment.html>