[ofa-general] minutes from socket over RDMA discussion at workshop

Wed May 2 09:25:59 PDT 2007

Scott Weitzenkamp (sweitzen) wrote:
> No, this is not right.  SDP has better latency and better throughput 
> than IPoIB CM, but also uses more CPU.

I can confirm that with some netperf numbers I've been gathering:

SD = Service Demand - usec CPU consumed per KB or tran smaller is better
SDx = Service Demand Xmit; SDr = Service Demand Recv
* back to back - no switch.
9k - 9000 byte MTU.  As it turns-out, ... this
      is the _default_ in the version of the myri10ge driver given to the
      author to use.

		      RedHat Enterprise Linux 5

                              Bulk Transfer                  "Latency"
                          Unidir            Bidir
     Card          Mbit/s SDx   SDr   Mbit/s SDx   SDr   Tran/s SDx   SDr
---------------------------------------------------------------------------
                    nnnn  nnnnn nnnnn  nnnn  nnnnn nnnnn  nnnnn nnnnn nnnnn
  AD313A  IPoIB     2970  4.418 4.544  3530  3.59  3.95   19290 n/a   n/a
  AD313A  SDP       7810  0.453 1.048 12820  0.69  0.68   38030 26.29 26.29
  AD313A  SDP p=0   7810  0.346 0.527 12670  0.42  0.043  19380 n/a   n/a
  AD144A  IP
  Myri10G IP 9k     9320  0.862 0.949 10950  1.00  0.86   19260 19.67 16.18 *
  Myri10G IP 9k msi 9320  0.449 0.672 10840  0.63  0.62   19430 11.68 11.56
  Myri10G IP    msi 7020  0.525 1.790  9820  1.22  1.22   not measured

Service demand for IPoIB TCP_RR and SDP p=0 not shown here because
netperf could never hit the confidence intervals for CPU utilization.
See the raw data for details.

* No switch - systems connected back-to-back

p=0 - recv_poll and send_poll module parameters of ib_sdp set to zero
       to address CPU util issue for SDP_RR test, where the equivalent
       of an entire core was consumed on each side

msi - MSI or Message Signaled Interrupts enabled

1.2.0 version of the myri10ge driver

wrt zero copy and sockets and such, long long ago, in an OS far away - HP-UX 
9.X, HP had a zero-copy for TCP/IP over FDDI.  This was when MTUs and page sizes 
were still well-aligned (both 4K, well 4K and change for the FDDI MTU).  The 
zero copy there was copy-on-write, and it was enabled only when an application 
made an explicit setsockopt() call telling the transport the application knew 
this was going to be happening.  Then it was up to the application to make sure 
it did not try to access the address range of a previous send() call until after 
it knew the transport was no longer referencing it.

ftp://ftp.cup.hp.com/dist/networking/briefs/copyavoid.pdf

rick jones