[openib-general] IPoIB CM

Michael S. Tsirkin mst at mellanox.co.il
Wed Nov 29 06:00:16 PST 2006


Hi!
Wanted to show you guys the IPoIB connected mode code I've written
in the last couple of weeks. I put it at ~mst/linux-2.6/.git ipoib_cm_branch.
With this code, I'm able to get 800MByte/sec or more with netperf
without options on a Mellanox 4x back-to-back DDR system.

This is still "work in progress", but comments are welcome.

Here's a short description of what I have so far:

a. The code's here:
git://staging.openfabrics.org/~mst/linux-2.6/.git ipoib_cm_branch
This is based on 2.6.19-rc6, so
~>git diff v2.6.19-rc6..ipoib_cm_branch
will show what I have done so far.
Note this currently includes the patch 073ae841d6a5098f7c6e17fc1f329350d950d1ce
which will be cleaned out when next I rebase against Linus.

b. How to activate:
Server:
#modprobe ib_ipoib
#/sbin/ifconfig ib0 mtu 65520
#./netperf-2.4.2/src/netserver

Client:
#modprobe ib_ipoib
#/sbin/ifconfig ib0 mtu 65520
#./netperf-2.4.2/src/netperf -H 11.4.3.68 -f M
	TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.4.3.68 (11.4.3.68)
	port 0 AF_INET : demo
	Recv   Send    Send
	Socket Socket  Message  Elapsed
	Size   Size    Size     Time     Throughput
	bytes  bytes   bytes    secs.    MBytes/sec

	87380  16384  16384    10.01     891.21

c. TODO list
1. Clean up stale connections
2. Clean up ipoib_neigh (move all new fields to ipoib_cm_tx)
3. Add IPOIB_CM config option, make it depend on EXPERIMENTAL
4. S/G support
5. Make CM use same CQ IPoIB uses for UD

d. Limitations
UDP multicast and UDP connections to IPoIB UD mode
currently don't work since we get packets that are too large to
send over a UD QP.
As a work around, one can now create separate interfaces
for use with CM and UD mode.

e. Some notes on code
1. SRQ is used for scalability to large cluster sizes
2. Only RC connections are used (UC does not support SRQ now)
3. Retry count is set to 0 since spec draft warns against retries
4. Each connection is used for data transfers in only 1 direction,
   so each connection is either active(TX) or passive (RX).
   2 sides that want to communicate create 2 connections.
5. Each active (TX) connection has a separate CQ for send completions -
   this keeps the code simple without CQ resize and other tricks

I'm looking at ways to limit the path mtu
for these connections, to make it work.

-- 
MST




More information about the general mailing list