[openib-general] [PATCH][iWARP] Added provider CM verbsandquery provider methods

Thu Sep 1 06:10:43 PDT 2005

> -----Original Message-----
> From: Gleb Natapov [mailto:glebn at voltaire.com] 
> Sent: Wednesday, August 31, 2005 1:55 AM
> To: Caitlin Bestler
> Cc: Tom Tucker; openib-general at openib.org
> Subject: Re: [openib-general] [PATCH][iWARP] Added provider 
> CM verbsandquery provider methods
> 
> On Tue, Aug 30, 2005 at 12:00:42PM -0700, Caitlin Bestler wrote:
> > > > TOE for the purposes of RDMA may have more legs within the 
> > > community, 
> > > > however, this has yet to be tested.
> > > Is it possible to implement RDMA semantics using linux native 
> > > TCP stack (with hardware assistance of cause)? Just asking.
> > > 
> > > 
> > 
> > It is possible to implement RDMA on the host processor. But
> > it will not match the performance of hardware. The difference
> > will be substantial at 10G. If someobody could build a software
> > only solution that performed at 10G they would have done so.
> > Having zero manufacturing cost would give them quite a 
> > competitive edge over solutions that required hardware.
> > 
> I am not talking about software only solution. Hardware assistance
> is needed, but something less then TOE. Something stateless like
> Dave wants.
> 
> > The need for offload has more to do with memory bandwidth
> > than raw processing power. The data bandwidth required to
> > support look-up of large data structures and for placement
> > of the raw payload nearly consumes the bus bandwidth when
> > operating at peak wire speeds. If you make that worse by
> > moving the raw packets over the wire, and *then* copying
> > them to a final location (a second memory move) *and*
> > additional memory touches for accessing control structures...
> > 
> Linux already have the infrastructure for zero-copy send, with
> some hardware help it is possible to implement zero-copy receive too.
> Moving data in memory is out of the question.
> 
> Anyway I think this questions should be answered before moving this
> discussion to netdev.

I don't know how to do it and I've thought quite a bit about it. 

If you want the semantics specified for RDDP over
a reliable transport, then you need to do TCP in hardware. 
Consider RDMA_WRITE. The target buffer is specified in the RDDP 
header which is in turn carried as part of the TCP payload. 
Therefore in order for the hardware to get to the RDDP header, it
must crack the TCP header. The TCP processing can't be stateless
because the hardware must discard things such as duplicates or 
risk overwriting an application buffer that has already been 
written, delivered, and reused for some other purpose. There are
many other examples, but the point is TCP state information is
needed by the hardware.

> 
> --
> 			Gleb.
>