[ewg] dissappointing IPoIB performance

Joseph Glanville joseph.glanville at orionvm.com.au
Wed Mar 21 09:20:56 PDT 2012


On 21 February 2012 07:20, richard Croucher <
Richard.croucher at informatix-sol.com> wrote:

> **
> There is nothing special about this specific configuration.  Both
> measurements were made on the same physical server and same OS config, so
> that can be ruled out as a factor for the difference.  I've seen similar
> results on many other systems as well.  They'd been getting closer and
> closer for some time, but now  10G Ethernet is lower latency than
> InfiniBand.   I'm sure you are aware as well, that there are similar
> results comparing VERB level programs on RoCE and InfiniBand, which show
> that Ethernet has the edge.   I'm a big fan of InfiniBand but it has to
> deliver the results.
>
>   --
>
> Richard Croucher
> www.informatix-sol.com
> +44-7802-213901
>
>   On Mon, 2012-02-20 at 17:03 +0000, Gilad Shainer wrote:
>
> Richard,
>
>
>
> Critical missing is the setup information. What is the server, CPU etc.
> Can you please provide?
>
>
>
> Gilad
>
>
>
>
>
>  *From:* ewg-bounces at lists.openfabrics.org [mailto:
> ewg-bounces at lists.openfabrics.org] *On Behalf Of *richard Croucher
> *Sent:* Monday, February 20, 2012 8:50 AM
> *To:* ewg at lists.openfabrics.org
> *Subject:* [ewg] dissappointing IPoIB performance
>
>
>
>
> I've been undertaking some internal QA testing of the Mellanox CX3's.
>
> An observation that I've seen for some time and is most likely to do with
> the IPoIB implementation rather than the HCA is that the latency of IPoIB
> is getting increasingly poor in comparison with the standard kernel TCP/IP
> stack over 10g Ethernet.
>
> If we look at the results for the CX3's running both 10G Ethernet and 40G
> InfiniBand, on same serve hardware,  I get the following median latency
> with my test setup.  Results are with my own test program so are only
> meaningful as a comparison with the other configurations running the same
> test.
>
> Running OFED 1.5.3 and RH 6.0
>
> IPoIB (connected)   TCP 33.67 uS    (switchless)
> IPoIB (datagram)    TCP 31.63 uS    (switchless)
> IPoIB (connected)  UDP 24.78 uS    (switchless)
> IPoIB (datagram)  UDP 24.28 uS     (switchless)
> IPoIB (connected)  UDP 25.37 uS    (1 hop) between ports on same switch
> IPoIB (connected)  TCP 34.48 uS     (1 hop)
> 10G Ethernet      UDP 24.04uS     (2 hops) across a LAG connected pair of
> Ethernet switches
> 10G Ethernet      TCP  34.59 uS    (2 hops)
>
> The Mellanox Ethernet drivers are tuned for low latency rather than
> throughput, but I would have hoped that given the 4x extra bandwidth
> available it would have helped the InfiniBand drivers outperform.
>
> I've seen similar results for CX2 .    10G ethernet is increasingly
> looking like the better option for low latency, particularly with the
> current generation of low latency Ethernet switches.  Switchless Ethernet
> has been better for some time than switchless InfiniBand, but it now looks
> to be the case in switched environments as well.  I think this reflects
> that there has been a lot of effort tweaking and tuning TCP/IP over
> Ethernet and its low level drivers, with very little activity on the IPoIB
> front.  Unless we see improvements here it will get increasingly difficult
> to justify InfiniBand deployments.
>
>
>
>
>   --
>
> Richard Croucher
> www.informatix-sol.com
> +44-7802-213901
>
>
>
>
>
>
>
> _______________________________________________
> ewg mailing list
> ewg at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>

IPoIB is not really a native IB ULP and has never had that great
performance. If performance is of concern then native IB Verbs is going to
greatly outperform TCP/IP on any hardware platform, be it Ethernet or
Infiniband.

See for instance some cheap test hardware under my desk:

spartan01 # ping spartan02
PING spartan02.prod.eq3.syd.au.ovs (10.1.33.2) 56(84) bytes of data.
64 bytes from spartan02.prod.eq3.syd.au.ovs (10.1.33.2): icmp_req=1 ttl=64
time=0.067 ms
64 bytes from spartan02.prod.eq3.syd.au.ovs (10.1.33.2): icmp_req=2 ttl=64
time=0.060 ms
64 bytes from spartan02.prod.eq3.syd.au.ovs (10.1.33.2): icmp_req=3 ttl=64
time=0.059 ms
64 bytes from spartan02.prod.eq3.syd.au.ovs (10.1.33.2): icmp_req=4 ttl=64
time=0.058 ms
64 bytes from spartan02.prod.eq3.syd.au.ovs (10.1.33.2): icmp_req=5 ttl=64
time=0.058 ms
--- spartan02 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 3999ms
rtt min/avg/max/mdev = 0.058/0.060/0.067/0.007 ms

spartan01 # rdma_lat
   local address: LID 0x06 QPN 0x7e006d PSN 0x7b0dd5 RKey 0xa042400 VAddr
0x00000001012001
  remote address: LID 0x01 QPN 0x3e0050 PSN 0x58a76e RKey 0xa002400 VAddr
0x00000001051001
Latency typical: 2.01 usec
Latency best   : 1.95 usec
Latency worst  : 2.8 usec

Notice that IPoIB is in the realm of 60 usec vs 2 usec on native RDMA ping?
This is also DDR quite old hardware mostly untuned, probably cost a total
of $2200 including switch and 2 HBAs.

One needs to compare apples with apples. Infiniband does TCP/IP fully in
software, Ethernet cards have receive and transmit offload engines for
TCP/IP because it's the Ethernet native protocol so to speak. So if you
must you should compare TCP/IP on Ethernet vs RDMA on IB.
TCP/IP is supported for compatibility reasons but it's by no means a good
idea to use it for low latency or high-throughput because the design of the
TCP/IP protocol facilitates neither.

Joseph.

-- 
*
Founder | Director | VP Research
Orion Virtualisation Solutions* | www.orionvm.com.au | Phone: 1300 56 99 52
| Mobile: 0428 754 84
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ewg/attachments/20120322/9ec74714/attachment.html>


More information about the ewg mailing list