[ewg] IPoIB to Ethernet routing performance

matthieu hautreux matthieu.hautreux at gmail.com
Thu Dec 16 14:20:35 PST 2010


> >    The router is fitted with one ConnectX2 QDR HCA and one dual port
Myricom 10G
> > Ethernet adapter.
> >
> > ...
> >
> >    Here are some numbers:
> >
> >    - 1 IPoIB stream between client and router: 20 Gbits/sec
> >
> >      Looks OK.
> >
> >    - 2 Ethernet streams between router and server: 19.5 Gbits/sec
> >
> >      Looks OK.
> >
>
>
> Actually I am amazed you can get such a speed with IPoIB. Trying with
> NPtcp on my DDR infiniband I can only obtain about 4.6Gbit/sec at the
> best packet size (that is 1/4 of the infiniband bandwidth) with this
> chip embedded in the mainboard: InfiniBand: Mellanox Technologies
> MT25204 [InfiniHost III Lx HCA]; and dual E5430 xeon (not nehalem).
> That's with 2.6.37 kernel and vanilla ib_ipoib module. What's wrong with
> my setup?
> I always assumed that such a slow speed was due to the lack of
> offloading capabilities you get with ethernet cards, but maybe I was
> wrong...?

Hi,

I made the same kind of experimentations than Sebastien and got results
similar to those of you Jabe, with about ~4.6Gbit/s.

I am using QDR HCA and ipoib in connected mode on the infiniband part of the
testbed and 2 * 10Ge ethernet cards in bonding on the ethernet side of the
router.
To get better results, I had to increase the MTU on the ethernet side from
1500 to 9000. Indeed, due to the TCP Path MTU discovery, during routed
exchanges the MTU used on the ipoib link for TCP messages was automatically
set to the minimum MTU of 1500. This small but yet very standard MTU value
does not seem to be well handled by the ipoib_cm layer.

Is this issue already known and/or reported ? It should be really
interesting to understand why a small value of MTU is such a problem for
ipoib_cm. After a quick look at the code, it seems that ipoib packet
processing is single threaded and that each ip packet is
transmitted/received and processed as a single unit. If that appears to be
the bottleneck, do you think that packets aggregation and/or processing
parallelization could be feasible in a future ipoib module ? A big part of
the ethernet networks are configured with an MTU of 1500 and 10Ge cards
currently employ parallelization strategy in their kernel module to cope
with this problem. It is clear that a bigger MTU is better but it is not
always possible to achieve due to existing equipments and machines. IMHO,
that is a real problem for infiniband/ethernet interoperability.

Sebastien, concerning your bad performance of 9.3Gbit/s when routing 2
streams from you infiniband client to your ethernet server, what is the mode
of your bonding on the ethernet side during the test ? are you using
balance-rr or LACP ? I got this kind of results with LACP as only one link
is really used during the transmissions and this link depends of the layer 2
informations of the peers involved in the communication (as long as you use
the default xmit_hash_policy).

HTH
Regards,
Matthieu

> Also what application did you use for the benchmark?
> Thank you
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ewg/attachments/20101216/bb921fbe/attachment.html>


More information about the ewg mailing list