[openib-general] [PATCH v2 04/13] Connection Manager
Steve Wise
swise at opengridcomputing.com
Tue Dec 5 07:39:58 PST 2006
On Tue, 2006-12-05 at 18:19 +0300, Evgeniy Polyakov wrote:
> On Tue, Dec 05, 2006 at 09:02:05AM -0600, Steve Wise (swise at opengridcomputing.com) wrote:
> > > > > This and a lot of other changes in this driver definitely says you
> > > > > implement your own stack of protocols on top of infiniband hardware.
> > > >
> > > > ...but I do know this driver is for 10-gig ethernet HW.
> > >
> > > It is for iwarp/rdma from description.
> > > If it is 10ge, then why does it parse incomping packet headers and
> > > implements initial tcp state machine?
> > >
> >
> > Its not implementing the TCP state machine at all. Its implementing the
> > MPA state machine (see the iWARP internet drafts). These packets are
> > TCP payload. MPA is used to negotiate RDMA mode on a TCP connection.
> > This entails an exchange of 2 messages on the TCP connection. Once this
> > is exchanged and both side agree, the connection is bound to an RDMA QP
> > and the connection moved into RDMA mode. From that point on, all IO is
> > done via the post_send() and post_recv().
>
> And why does rdma require window scaling, keep alive, nagle and other
> interesting options from TCP spec?
>
The connection setup messages sent to the hardware need to have these
parameters so the TCP engine on the HW knows how to do connection
options, windows, etc.
> This really looks like initial implementation of TCP in hardware - you
> setup flags like doing the same using setsockopt() and then hardware
> manages the flow like network stack manages TCP state machine changes.
>
> According to draft-culley-iwarp-mpa-03.txt this layer can do a lot of
> things with valid TCP flow like
>
> 5. The TCP sender puts the FPDUs into the TCP stream. If the TCP
> Sender is MPA-aware, it segments the TCP stream in such a way
> that a TCP Segment boundary is also the boundary of an FPDU.
> TCP then passes each segment to the IP layer for transmission.
>
> Phrases like "MPA-aware TCP" rises a lot of questions - briefly saying
> that hardware (even if it is called ethernet driver) can create and work
> with own TCP flows potentially modified in the way it likes which is seen
> in driver. Likely such flows will not be seen by upper layers like OS
> network stack according to hardware descriptions.
>
> Is it correct?
>
I don't quite get your point about the driver aspect of this?
The HW manages the iWARP connection including data flow. It adheres to
the MPA, RDDP, and RDMAP protocol specification IDs from the IETF. The
HW manages how data gets pushed out in the RDMA stream. The RDMA
Driver just requests a TCP connection and does the MPA exchange. Then
tells the hardware to move the connection into RDMA mode. From that
point on, the driver simply suffles IO work requests from the consumer
application to the hardware and handles asynchronous events while the
connection is up and running.
Steve.
More information about the general
mailing list