[ofa-general] Expected RDMA performance

Mon Oct 22 12:57:59 PDT 2007

At 02:06 AM 10/22/2007, Koen Segers wrote:
>On Fri, 2007-10-19 at 09:09 -0700, Michael Krause wrote:
> > At 08:20 AM 10/19/2007, Peter Kjellstrom wrote:
> > > On Thursday 18 October 2007, Chuck Hartley wrote:
> > > ...
> > > > 8388608        5000            1342.12               1342.12
> > > > ------------------------------------------------------------------
> > > >
> > > > Is this typical RDMA performance?
> > >
> > > It's close to what I've seen on similar hw. ~1400 is what you can
> > > push through
> > > the 8x pci-e of the intel 5000 chipset (confirmed by trying 4x pci-e
> > > which
> > > has shown ~700).
> > >
> > > > What is the maximum theoretical BW for
> > > > DDR IB - 1525MB/sec?
> > >
> > > No, it's 20 Gbps on the wire and 8/10 encoded so 16 Gbps effective
> > > which is
> > > 2000 MB/s (10-base) and 1907 MiB/s (2-base).
> >
> > There is also IB protocol overhead combined with driver / device
> > control traffic overhead (consumes device as well as PCI resources /
> > bandwidth), end-to-end control traffic  which is also a function of
> > how the application is constructed.   In general, hitting about 80-85%
> > of the theoretical maximum is possible.
>
>
>I'm very interested in this result. Can you elaborate this a bit more?

In what regard?

>Has anyone documented the ib traffic control mechanism?

Driver-to-device interactions consume resources and contend for local I/O 
bandwidth / local device processing

Application-to-device interactions have similar impacts

ULP exchanges such as SEND operations to communicate protection keys, 
addresses, etc.

Host OS / application execution to generate work as well as schedule / 
process work and deal with any interrupts / polling mechanisms.  This can 
lead from zero to significant delays resulting in burst style traffic 
patterns.   Also many workloads may be small transaction dominate so their 
efficiency will be significantly lower than one that is large transaction 
dominant.

And so forth.

There are many variables and mileage will vary as a result.  Some will do 
quite well while others will not.   This is why a good range of benchmarks 
is required to evaluate whether a given solution is reasonable for the 
targeted workloads or problem space.   Anyone can contrive something to do 
outstanding at one thing while completely biting it in other areas.

Mike

>Regards,
>
>Koen Segers
> >
> > > On our system (with a different HCA) we see quite a difference with
> > > snoop-filter off (bios option). With snoop off (our) application
> > > performance
> > > goes up (not very suprising) but IB performance goes down (latency
> > > 0.4us
> > > worse and bw ~1400->1200).
> >
> > Mike
> > _______________________________________________
> > general mailing list
> > general at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> >
> > To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general
>*** Disclaimer ***
>
>Vlaamse Radio- en Televisieomroep
>Auguste Reyerslaan 52, 1043 Brussel
>
>nv van publiek recht
>BTW BE 0244.142.664
>RPR Brussel
>http://www.vrt.be/disclaimer
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20071022/de9f7af9/attachment.html>