[openib-general] IPoIB CM

Bernard King-Smith wombat2 at us.ibm.com
Wed Nov 29 07:23:40 PST 2006


"Michael S. Tsirkin"<mst at mellanox.co.il> wrote on on Wed, 29 Nov 2006 
16:00:16 +0200 -----
> 
> To:
> 
> openib-general at openib.org
> 
> Subject:
> 
> [openib-general] IPoIB CM
> 
> Hi!
> Wanted to show you guys the IPoIB connected mode code I've written
> in the last couple of weeks. I put it at ~mst/linux-2.6/.git 
ipoib_cm_branch.
> With this code, I'm able to get 800MByte/sec or more with netperf
> without options on a Mellanox 4x back-to-back DDR system.

These are very good results close to what I expected. However see some 
tuning suggestions below.

> 
> This is still "work in progress", but comments are welcome.
> 
> Here's a short description of what I have so far:
> 
> a. The code's here:
> git://staging.openfabrics.org/~mst/linux-2.6/.git ipoib_cm_branch
> This is based on 2.6.19-rc6, so
> ~>git diff v2.6.19-rc6..ipoib_cm_branch
> will show what I have done so far.
> Note this currently includes the patch 
> 073ae841d6a5098f7c6e17fc1f329350d950d1ce
> which will be cleaned out when next I rebase against Linus.
> 
> b. How to activate:
> Server:
> #modprobe ib_ipoib
> #/sbin/ifconfig ib0 mtu 65520
> #./netperf-2.4.2/src/netserver
> 
> Client:
> #modprobe ib_ipoib
> #/sbin/ifconfig ib0 mtu 65520
> #./netperf-2.4.2/src/netperf -H 11.4.3.68 -f M
>    TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.4.3.
> 68 (11.4.3.68)
>    port 0 AF_INET : demo
>    Recv   Send    Send
>    Socket Socket  Message  Elapsed
>    Size   Size    Size     Time     Throughput
>    bytes  bytes   bytes    secs.    MBytes/sec
> 
>    87380  16384  16384    10.01     891.21

With a MTU of 64K, why are you using such small send and receive socket 
sizes and message size? Can you try setting the send and receive socket 
sizes to 512K and the send message size to 128K. This way you send 2 
packets per socket write and can receive up to 8 packets in the socket 
buffers. These are typical sizes I have used on other network adapters 
using a MTU of 64K. 

> 
> c. TODO list
> 1. Clean up stale connections
> 2. Clean up ipoib_neigh (move all new fields to ipoib_cm_tx)
> 3. Add IPOIB_CM config option, make it depend on EXPERIMENTAL
> 4. S/G support
> 5. Make CM use same CQ IPoIB uses for UD
> 
> d. Limitations
> UDP multicast and UDP connections to IPoIB UD mode
> currently don't work since we get packets that are too large to
> send over a UD QP.
> As a work around, one can now create separate interfaces
> for use with CM and UD mode.
> 
> e. Some notes on code
> 1. SRQ is used for scalability to large cluster sizes
> 2. Only RC connections are used (UC does not support SRQ now)
> 3. Retry count is set to 0 since spec draft warns against retries
> 4. Each connection is used for data transfers in only 1 direction,
>    so each connection is either active(TX) or passive (RX).
>    2 sides that want to communicate create 2 connections.
> 5. Each active (TX) connection has a separate CQ for send completions -
>    this keeps the code simple without CQ resize and other tricks
> 
> I'm looking at ways to limit the path mtu
> for these connections, to make it work.
> 
> -- 
> MST
> 


Bernie King-Smith 
IBM Corporation
Server Group
Cluster System Performance 
wombat2 at us.ibm.com    (845)433-8483
Tie. 293-8483 or wombat2 on NOTES 

"We are not responsible for the world we are born into, only for the world 
we leave when we die.
So we have to accept what has gone before us and work to change the only 
thing we can,
-- The Future." William Shatner
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20061129/3fd0696d/attachment.html>


More information about the general mailing list