[openib-general] [PATCH v2 04/13] Connection Manager

Evgeniy Polyakov johnpol at 2ka.mipt.ru
Tue Dec 5 07:19:06 PST 2006


On Tue, Dec 05, 2006 at 09:02:05AM -0600, Steve Wise (swise at opengridcomputing.com) wrote:
> > >  > This and a lot of other changes in this driver definitely says you
> > >  > implement your own stack of protocols on top of infiniband hardware.
> > > 
> > > ...but I do know this driver is for 10-gig ethernet HW.
> > 
> > It is for iwarp/rdma from description.
> > If it is 10ge, then why does it parse incomping packet headers and
> > implements initial tcp state machine?
> > 
> 
> Its not implementing the TCP state machine at all. Its implementing the
> MPA state machine (see the iWARP internet drafts).  These packets are
> TCP payload.  MPA is used to negotiate RDMA mode on a TCP connection.
> This entails an exchange of 2 messages on the TCP connection.  Once this
> is exchanged and both side agree, the connection is bound to an RDMA QP
> and the connection moved into RDMA mode.  From that point on, all IO is
> done via the post_send() and post_recv().

And why does rdma require window scaling, keep alive, nagle and other
interesting options from TCP spec?

This really looks like initial implementation of TCP in hardware - you
setup flags like doing the same using setsockopt() and then hardware
manages the flow like network stack manages TCP state machine changes.

According to draft-culley-iwarp-mpa-03.txt this layer can do a lot of
things with valid TCP flow like

   5.  The TCP sender puts the FPDUs into the TCP stream.  If the TCP
       Sender is MPA-aware, it segments the TCP stream in such a way
       that a TCP Segment boundary is also the boundary of an FPDU.  
       TCP then passes each segment to the IP layer for transmission.

Phrases like "MPA-aware TCP" rises a lot of questions - briefly saying
that hardware (even if it is called ethernet driver) can create and work
with own TCP flows potentially modified in the way it likes which is seen 
in driver. Likely such flows will not be seen by upper layers like OS 
network stack according to hardware descriptions.

Is it correct?

> Steve. 

-- 
	Evgeniy Polyakov




More information about the general mailing list