[ewg] IPoIB to Ethernet routing performance

sebastien dugue sebastien.dugue at bull.net
Fri Dec 17 00:01:36 PST 2010


  Hi Matthieu,

On Thu, 16 Dec 2010 23:20:35 +0100
matthieu hautreux <matthieu.hautreux at gmail.com> wrote:

> > >    The router is fitted with one ConnectX2 QDR HCA and one dual port
> Myricom 10G
> > > Ethernet adapter.
> > >
> > > ...
> > >
> > >    Here are some numbers:
> > >
> > >    - 1 IPoIB stream between client and router: 20 Gbits/sec
> > >
> > >      Looks OK.
> > >
> > >    - 2 Ethernet streams between router and server: 19.5 Gbits/sec
> > >
> > >      Looks OK.
> > >
> >
> >
> > Actually I am amazed you can get such a speed with IPoIB. Trying with
> > NPtcp on my DDR infiniband I can only obtain about 4.6Gbit/sec at the
> > best packet size (that is 1/4 of the infiniband bandwidth) with this
> > chip embedded in the mainboard: InfiniBand: Mellanox Technologies
> > MT25204 [InfiniHost III Lx HCA]; and dual E5430 xeon (not nehalem).
> > That's with 2.6.37 kernel and vanilla ib_ipoib module. What's wrong with
> > my setup?
> > I always assumed that such a slow speed was due to the lack of
> > offloading capabilities you get with ethernet cards, but maybe I was
> > wrong...?
> 
> Hi,
> 
> I made the same kind of experimentations than Sebastien and got results
> similar to those of you Jabe, with about ~4.6Gbit/s.
> 
> I am using QDR HCA and ipoib in connected mode on the infiniband part of the
> testbed and 2 * 10Ge ethernet cards in bonding on the ethernet side of the
> router.
> To get better results, I had to increase the MTU on the ethernet side from
> 1500 to 9000. Indeed, due to the TCP Path MTU discovery, during routed
> exchanges the MTU used on the ipoib link for TCP messages was automatically
> set to the minimum MTU of 1500. This small but yet very standard MTU value
> does not seem to be well handled by the ipoib_cm layer.

  This may be due to the fact that the IB MTU is 2048. Every 1500 bytes packet
is padded to 2048 bytes before being sent through the wire, so you're loosing
roughly 25% bandwidth compared to an IPoIB MTU which is a multiple of 2048.

> 
> Is this issue already known and/or reported ? It should be really
> interesting to understand why a small value of MTU is such a problem for
> ipoib_cm. After a quick look at the code, it seems that ipoib packet
> processing is single threaded and that each ip packet is
> transmitted/received and processed as a single unit. If that appears to be
> the bottleneck, do you think that packets aggregation and/or processing
> parallelization could be feasible in a future ipoib module ? A big part of
> the ethernet networks are configured with an MTU of 1500 and 10Ge cards
> currently employ parallelization strategy in their kernel module to cope
> with this problem. It is clear that a bigger MTU is better but it is not
> always possible to achieve due to existing equipments and machines. IMHO,
> that is a real problem for infiniband/ethernet interoperability.
> 
> Sebastien, concerning your bad performance of 9.3Gbit/s when routing 2
> streams from you infiniband client to your ethernet server, what is the mode
> of your bonding on the ethernet side during the test ? are you using
> balance-rr or LACP ?

  I did not use any Ethernet teaming, I only declared 2 aliases on the
clients' ib0 and set the routing tables accordingly.

  Sébastien.

> I got this kind of results with LACP as only one link
> is really used during the transmissions and this link depends of the layer 2
> informations of the peers involved in the communication (as long as you use
> the default xmit_hash_policy).
> 
> HTH
> Regards,
> Matthieu
> 
> > Also what application did you use for the benchmark?
> > Thank you



More information about the ewg mailing list