[ofa-general] iWARP peer-to-peer CM proposal
Kanevsky, Arkady
Arkady.Kanevsky at netapp.com
Wed Nov 21 08:02:25 PST 2007
Group,
below is proposal on how to resolve peer-to-peer iWARP CM issue
discovered at interop event.
The main issue is that MPA spec (relevant portion of IETF RFC 5044 is
below) require that
connection initiator send first message over the established connection.
Multiple MPI implementations and several other apps use peer-to-peer
model.
So rather then forcing all of them to do it on their own, which will not
help with
interop between different implementations, the goal is to extend lower
layers to provide it.
Our first idea was to leave MPA protocol untouched and try to solve this
problem
in iw_cm. But there are too many complications to it. First, in order to
adhere to RFC5044
initiator must send first FPDU and responder process it. But since the
connection is already
established processing FPDU involves ULP on whose behalf the connection
is created.
So either initiator sends a message which generates completion on
responder CQ, thus visible
to ULP, or not. In the later case, the only op which can do it is RDMA
one, which means
that responder somehow provided initiator S-tag which it can use. So,
this is an extension
to MPA, probably using private data. And that responder upon receiving
it destroy this S-tag.
In any case this is an extension of MPA.
In the former, Send is used but this requires a buffer to be posted to
CQ. But since
the same CQ (or SharedCQ) can be used by other connections at the same
time it can cause
the responder CM posted buffer to be consumed by other connection. This
is not acceptable.
So new we consider extension to MPA protocol.
The goal is to be completely backwards compatible to existing version 1.
In a nutshell, use a "flag" in the MPA request message which indicates
that
"ready to receive" message will be send by requestor upon receiving
MPA response message with connection acceptance.
here are the changes to IETF RFC5044
1. 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0 | |
+ Key (16 bytes containing "MPA ID Req Frame") +
4 | (4D 50 41 20 49 44 20 52 65 71 20 46 72 61 6D 65) |
+ Or (16 bytes containing "MPA ID Rep Frame") +
8 | (4D 50 41 20 49 44 20 52 65 70 20 46 72 61 6D 65) |
+ Or (16 bytes containing "MPA ID Rtr Frame") +
12 | (4D 50 41 20 49 44 20 52 74 52 20 46 72 61 6D 65) |
+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
16 |M|C|R|S| Res | Rev | PD_Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
~ ~
~ Private Data ~
| |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
2. S: indicator in the Req frame whether or not Requestor will send Rtr
frame.
In Req frame, if set to 1 then Rtr frame will be sent if responder
sends Rep frame with accept bit set. 0 indicate that Rtr frame
will not be sent.
In Rep frame, 0 means that Responder cannot support Rtr frame,
while 1 that it is and is waiting for it.
(While my preference is to handle this as MPA protocol version
matching rules,
proposed method will provide complete backwards compatibility)
Unused by Rtr frame. That is set to 0 in Rtr frame and ignored
by responder.
All other bits M,C,R and remainder of Res treated as in MPA ver 1.
Rtr frame adhere to C bit as specified in Rep frame
3. No private data format is defined for Rtr in this version.
4. Example will be added to present Rtr model.
That is if S bit is not set the current MPA ver 1 model is followed.
And if S bit is set then "proposed" model with Rtr message is followed.
5. Requestor use of Rtr frame must adhere to S bit setting of Rep frame.
**************************************************
While the process of driving this proposal thru IETF is very very
length,
in order to solve this problem now, we can still use this proposal with
the
current version 1 of MPA. All existing implementation will still work.
And if both sides support this change than peer-to-peer model is also
provided.
Comments, suggestion, critics requested.
I am especially want to know if we are missing some gotch you
which was discussed by RDDP WG when they rejected peer-to-peer model for
MPA.
iWARP vendors, please comment on the feasibility of implementing this
MPA
extension.
*********************************************************************
7. Connection Semantics
7.1. Connection Setup
MPA requires that the Consumer MUST activate MPA, and any TCP
enhancements for MPA, on a TCP half connection at the same location
in the octet stream at both the sender and the receiver. This is
required in order for the Marker scheme to correctly locate the
Markers (if enabled) and to correctly locate the first FPDU.
MPA, and any TCP enhancements for MPA are enabled by the ULP in both
directions at once at an endpoint.
Culley, et al. Standards Track [Page 24]
<http://tools.ietf.org/html/rfc5044#page-25>
RFC 5044 <http://tools.ietf.org/html/rfc5044> MPA
Framing for TCP October 2007
This can be accomplished several ways, and is left up to DDP's ULP:
* DDP's ULP MAY require DDP on MPA startup immediately after TCP
connection setup. This has the advantage that no streaming mode
negotiation is needed. An example of such a protocol is shown in
Figure 10: Example Immediate Startup negotiation.
This may be accomplished by using a well-known port, or a service
locator protocol to locate an appropriate port on which DDP on
MPA is expected to operate.
* DDP's ULP MAY negotiate the start of DDP on MPA sometime after a
normal TCP startup, using TCP streaming data exchanges on the
same connection. The exchange establishes that DDP on MPA (as
well as other ULPs) will be used, and exactly locates the point
in the octet stream where MPA is to begin operation. Note that
such a negotiation protocol is outside the scope of this
specification. A simplified example of such a protocol is shown
in Figure 9: Example Delayed Startup negotiation on page 33.
An MPA endpoint operates in two distinct phases.
The Startup Phase is used to verify correct MPA setup, exchange CRC
and Marker configuration, and optionally pass Private Data between
endpoints prior to completing a DDP connection. During this phase,
specifically formatted frames are exchanged as TCP byte streams
without using CRCs or Markers. During this phase a DDP endpoint need
not be "bound" to the MPA connection. In fact, the choice of DDP
endpoint and its operating parameters may not be known until the
Consumer supplied Private Data (if any) has been examined by the
Consumer.
The second distinct phase is Full Operation during which FPDUs are
sent using all the rules that pertain (CRCs, Markers, MULPDU
restrictions, etc.). A DDP endpoint MUST be "bound" to the MPA
connection at entry to this phase.
When Private Data is passed between ULPs in the Startup Phase, the
ULP is responsible for interpreting that data, and then placing MPA
into Full Operation.
Note: The following text differentiates the two endpoints by calling
them Initiator and Responder. This is quite arbitrary and is NOT
related to the TCP startup (SYN, SYN/ACK sequence). The
Initiator is the side that sends first in the MPA startup
sequence (the MPA Request Frame).
Culley, et al. Standards Track [Page 25]
<http://tools.ietf.org/html/rfc5044#page-26>
RFC 5044 <http://tools.ietf.org/html/rfc5044> MPA
Framing for TCP October 2007
Note: The possibility that both endpoints would be allowed to make a
connection at the same time, sometimes called an active/active
connection, was considered by the work group and rejected. There
were several motivations for this decision. One was that
applications needing this facility were few (none other than
theoretical at the time of this document). Another was that the
facility created some implementation difficulties, particularly
with the "dual stack" designs described later on. A last issue
was that dealing with rejected connections at startup would have
required at least an additional frame type, and more recovery
actions, complicating the protocol. While none of these issues
was overwhelming, the group and implementers were not motivated
to do the work to resolve these issues. The protocol includes a
method of detecting these active/active startup attempts so that
they can be rejected and an error reported.
The ULP is responsible for determining which side is Initiator or
Responder. For client/server type ULPs, this is easy. For peer-peer
ULPs (which might utilize a TCP style active/active startup), some
mechanism (not defined by this specification) must be established, or
some streaming mode data exchanged prior to MPA startup to determine
which side starts in Initiator and which starts in Responder MPA
mode.
7.1.1 MPA Request and Reply Frame Format
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0 | |
+ Key (16 bytes containing "MPA ID Req Frame") +
4 | (4D 50 41 20 49 44 20 52 65 71 20 46 72 61 6D 65) |
+ Or (16 bytes containing "MPA ID Rep Frame") +
8 | (4D 50 41 20 49 44 20 52 65 70 20 46 72 61 6D 65) |
+ +
12 | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
16 |M|C|R| Res | Rev | PD_Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
~ ~
~ Private Data ~
| |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 8: MPA Request/Reply Frame
Culley, et al. Standards Track [Page 26]
<http://tools.ietf.org/html/rfc5044#page-27>
RFC 5044 <http://tools.ietf.org/html/rfc5044> MPA
Framing for TCP October 2007
Key: This field contains the "key" used to validate that the sender
is an MPA sender. Initiator mode senders MUST set this field to
the fixed value "MPA ID Req Frame" or (in byte order) 4D 50 41 20
49 44 20 52 65 71 20 46 72 61 6D 65 (in hexadecimal). Responder
mode receivers MUST check this field for the same value, and
close the connection and report an error locally if any other
value is detected. Responder mode senders MUST set this field to
the fixed value "MPA ID Rep Frame" or (in byte order) 4D 50 41 20
49 44 20 52 65 70 20 46 72 61 6D 65 (in hexadecimal). Initiator
mode receivers MUST check this field for the same value, and
close the connection and report an error locally if any other
value is detected.
M: This bit declares an endpoint's REQUIRED Marker usage. When this
bit is '1' in an MPA Request Frame, the Initiator declares that
Markers are REQUIRED in FPDUs sent from the Responder. When set
to '1' in an MPA Reply Frame, this bit declares that Markers are
REQUIRED in FPDUs sent from the Initiator. When in a received
MPA Request Frame or MPA Reply Frame and the value is '0',
Markers MUST NOT be added to the data stream by that endpoint.
When '1' Markers MUST be added as described in Section 4.3
<http://tools.ietf.org/html/rfc5044#section-4.3> , MPA
Markers.
C: This bit declares an endpoint's preferred CRC usage. When this
field is '0' in the MPA Request Frame and the MPA Reply Frame,
CRCs MUST not be checked and need not be generated by either
endpoint. When this bit is '1' in either the MPA Request Frame
or MPA Reply Frame, CRCs MUST be generated and checked by both
endpoints. Note that even when not in use, the CRC field remains
present in the FPDU. When CRCs are not in use, the CRC field
MUST be considered valid for FPDU checking regardless of its
contents.
R: This bit is set to zero, and not checked on reception in the MPA
Request Frame. In the MPA Reply Frame, this bit is the Rejected
Connection bit, set by the Responders ULP to indicate acceptance
'0', or rejection '1', of the connection parameters provided in
the Private Data.
Res: This field is reserved for future use. It MUST be set to zero
when sending, and not checked on reception.
Culley, et al. Standards Track [Page 27]
<http://tools.ietf.org/html/rfc5044#page-28>
RFC 5044 <http://tools.ietf.org/html/rfc5044> MPA
Framing for TCP October 2007
Rev: This field contains the revision of MPA. For this version of
the specification, senders MUST set this field to one. MPA
receivers compliant with this version of the specification MUST
check this field. If the MPA receiver cannot interoperate with
the received version, then it MUST close the connection and
report an error locally. Otherwise, the MPA receiver should
report the received version to the ULP.
PD_Length: This field MUST contain the length in octets of the
Private Data field. A value of zero indicates that there is no
Private Data field present at all. If the receiver detects that
the PD_Length field does not match the length of the Private Data
field, or if the length of the Private Data field exceeds 512
octets, the receiver MUST close the connection and report an
error locally. Otherwise, the MPA receiver should pass the
PD_Length value and Private Data to the ULP.
Private Data: This field may contain any value defined by ULPs or may
not be present. The Private Data field MUST be between 0 and 512
octets in length. ULPs define how to size, set, and validate
this field within these limits. Private Data usage is further
discussed in Section 7.1.4
<http://tools.ietf.org/html/rfc5044#section-7.1.4> .
7.1.2. Connection Startup Rules
The following rules apply to MPA connection Startup Phase:
1. When MPA is started in the Initiator mode, the MPA implementation
MUST send a valid MPA Request Frame. The MPA Request Frame MAY
include ULP-supplied Private Data.
2. When MPA is started in the Responder mode, the MPA implementation
MUST wait until an MPA Request Frame is received and validated
before entering Full MPA/DDP Operation.
If the MPA Request Frame is improperly formatted, the
implementation MUST close the TCP connection and exit MPA.
If the MPA Request Frame is properly formatted but the Private
Data is not acceptable, the implementation SHOULD return an MPA
Reply Frame with the Rejected Connection bit set to '1'; the MPA
Reply Frame MAY include ULP-supplied Private Data; the
implementation MUST exit MPA, leaving the TCP connection open.
The ULP may close TCP or use the connection for other purposes.
If the MPA Request Frame is properly formatted and the Private
Data is acceptable, the implementation SHOULD return an MPA Reply
Frame with the Rejected Connection bit set to '0'; the MPA Reply
Culley, et al. Standards Track [Page 28]
<http://tools.ietf.org/html/rfc5044#page-29>
RFC 5044 <http://tools.ietf.org/html/rfc5044> MPA
Framing for TCP October 2007
Frame MAY include ULP-supplied Private Data; and the Responder
SHOULD prepare to interpret any data received as FPDUs and pass
any received ULPDUs to DDP.
Note: Since the receiver's ability to deal with Markers is
unknown until the Request and Reply Frames have been
received, sending FPDUs before this occurs is not possible.
Note: The requirement to wait on a Request Frame before sending a
Reply Frame is a design choice. It makes for a well-ordered
sequence of events at each end, and avoids having to specify
how to deal with situations where both ends start at the same
time.
3. MPA Initiator mode implementations MUST receive and validate an
MPA Reply Frame.
If the MPA Reply Frame is improperly formatted, the
implementation MUST close the TCP connection and exit MPA.
If the MPA Reply Frame is properly formatted but is the Private
Data is not acceptable, or if the Rejected Connection bit is set
to '1', the implementation MUST exit MPA, leaving the TCP
connection open. The ULP may close TCP or use the connection for
other purposes.
If the MPA Reply Frame is properly formatted and the Private Data
is acceptable, and the Reject Connection bit is set to '0', the
implementation SHOULD enter Full MPA/DDP Operation Phase;
interpreting any received data as FPDUs and sending DDP ULPDUs as
FPDUs.
4. MPA Responder mode implementations MUST receive and validate at
least one FPDU before sending any FPDUs or Markers.
Note: This requirement is present to allow the Initiator time to
get its receiver into Full Operation before an FPDU arrives,
avoiding potential race conditions at the Initiator. This
was also subject to some debate in the work group before
rough consensus was reached. Eliminating this requirement
would allow faster startup in some types of applications.
However, that would also make certain implementations
(particularly "dual stack") much harder.
5. If a received "Key" does not match the expected value (see
Section 7.1.1 <http://tools.ietf.org/html/rfc5044#section-7.1.1>
, MPA Request and Reply Frame Format) the TCP/DDP
connection MUST be closed, and an error returned to the ULP.
Culley, et al. Standards Track [Page 29]
<http://tools.ietf.org/html/rfc5044#page-30>
RFC 5044 <http://tools.ietf.org/html/rfc5044> MPA
Framing for TCP October 2007
6. The received Private Data fields may be used by Consumers at
either end to further validate the connection and set up DDP or
other ULP parameters. The Initiator ULP MAY close the
TCP/MPA/DDP connection as a result of validating the Private Data
fields. The Responder SHOULD return an MPA Reply Frame with the
"Reject Connection" bit set to '1' if the validation of the
Private Data is not acceptable to the ULP.
7. When the first FPDU is to be sent, then if Markers are enabled,
the first octets sent are the special Marker 0x00000000, followed
by the start of the FPDU (the FPDU's ULPDU Length field). If
Markers are not enabled, the first octets sent are the start of
the FPDU (the FPDU's ULPDU Length field).
8. MPA implementations MUST use the difference between the MPA
Request Frame and the MPA Reply Frame to check for incorrect
"Initiator/Initiator" startups. Implementations SHOULD put a
timeout on waiting for the MPA Request Frame when started in
Responder mode, to detect incorrect "Responder/Responder"
startups.
9. MPA implementations MUST validate the PD_Length field. The
buffer that receives the Private Data field MUST be large enough
to receive that data; the amount of Private Data MUST not exceed
the PD_Length or the application buffer. If any of the above
fails, the startup frame MUST be considered improperly formatted.
10. MPA implementations SHOULD implement a reasonable timeout while
waiting for the entire set of startup frames; this prevents
certain denial-of-service attacks. ULPs SHOULD implement a
reasonable timeout while waiting for FPDUs, ULPDUs, and
application level messages to guard against application failures
and certain denial-of-service attacks.
7.1.3. Example Delayed Startup Sequence
A variety of startup sequences are possible when using MPA on TCP.
Following is an example of an MPA/DDP startup that occurs after TCP
has been running for a while and has exchanged some amount of
streaming data. This example does not use any Private Data (an
example that does is shown later in Section 7.1.4.2
<http://tools.ietf.org/html/rfc5044#section-7.1.4.2> , Example
Immediate Startup Using Private Data), although it is perfectly legal
to include the Private Data. Note that since the example does not
use any Private Data, there are no ULP interactions shown between
receiving "startup frames" and putting MPA into Full Operation.
Culley, et al. Standards Track [Page 30]
<http://tools.ietf.org/html/rfc5044#page-31>
RFC 5044 <http://tools.ietf.org/html/rfc5044> MPA
Framing for TCP October 2007
Initiator Responder
+---------------------------+
|ULP streaming mode |
| <Hello> request to |
| transition to DDP/MPA | +---------------------------+
| mode (optional). | --------> |ULP gets request; |
+---------------------------+ | enables MPA Responder |
| mode with last (optional)|
| streaming mode |
| <Hello Ack> for MPA to |
| send. |
+---------------------------+ |MPA waits for incoming |
|ULP receives streaming | <-------- | <MPA Request Frame>. |
| <Hello Ack>; | +---------------------------+
|Enters MPA Initiator mode; |
|MPA sends |
| <MPA Request Frame>; |
|MPA waits for incoming | +---------------------------+
| <MPA Reply Frame>. | - - - - > |MPA receives |
+---------------------------+ | <MPA Request Frame>. |
|Consumer binds DDP to MPA; |
|MPA sends the |
| <MPA Reply Frame>. |
|DDP/MPA enables FPDU |
+---------------------------+ | decoding, but does not |
|MPA receives the | < - - - - | send any FPDUs. |
| <MPA Reply Frame> | +---------------------------+
|Consumer binds DDP to MPA; |
|DDP/MPA begins Full |
| Operation. |
|MPA sends first FPDU (as | +---------------------------+
| DDP ULPDUs become | ========> |MPA receives first FPDU. |
| available). | |MPA sends first FPDU (as |
+---------------------------+ | DDP ULPDUs become |
<====== | available). |
+---------------------------+
Figure 9: Example Delayed Startup Negotiation
Culley, et al. Standards Track [Page 31]
<http://tools.ietf.org/html/rfc5044#page-32>
RFC 5044 <http://tools.ietf.org/html/rfc5044> MPA
Framing for TCP October 2007
An example Delayed Startup sequence is described below:
* Active and passive sides start up a TCP connection in the
usual fashion, probably using sockets APIs. They exchange
some amount of streaming mode data. At some point, one side
(the MPA Initiator) sends streaming mode data that
effectively says "Hello, let's go into MPA/DDP mode".
* When the remote side (the MPA Responder) gets this streaming mode
message, the Consumer would send a last streaming mode message
that effectively says "I acknowledge your Hello, and am now in
MPA Responder mode". The exchange of these messages establishes
the exact point in the TCP stream where MPA is enabled. The
Responding Consumer enables MPA in the Responder mode and waits
for the initial MPA startup message.
* The Initiating Consumer would enable MPA startup in the
Initiator mode which then sends the MPA Request Frame. It is
assumed that no Private Data messages are needed for this
example, although it is possible to do so. The Initiating
MPA (and Consumer) would also wait for the MPA connection to
be accepted.
* The Responding MPA would receive the initial MPA Request Frame
and would inform the Consumer that this message arrived. The
Consumer can then accept the MPA/DDP connection or close the TCP
connection.
* To accept the connection request, the Responding Consumer would
use an appropriate API to bind the TCP/MPA connections to a DDP
endpoint, thus enabling MPA/DDP into Full Operation. In the
process of going to Full Operation, MPA sends the MPA Reply
Frame. MPA/DDP waits for the first incoming FPDU before sending
any FPDUs.
* If the initial TCP data was not a properly formatted MPA Request
Frame, MPA will close or reset the TCP connection immediately.
* The Initiating MPA would receive the MPA Reply Frame and
would report this message to the Consumer. The Consumer can
then accept the MPA/DDP connection, or close or reset the TCP
connection to abort the process.
* On determining that the connection is acceptable, the
Initiating Consumer would use an appropriate API to bind the
TCP/MPA connections to a DDP endpoint thus enabling MPA/DDP
into Full Operation. MPA/DDP would begin sending DDP
messages as MPA FPDUs.
Culley, et al. Standards Track [Page 32]
<http://tools.ietf.org/html/rfc5044#page-33>
RFC 5044 <http://tools.ietf.org/html/rfc5044> MPA
Framing for TCP October 2007
7.1.4. Use of Private Data
This section is advisory in nature, in that it suggests a method by
which a ULP can deal with pre-DDP connection information exchange.
7.1.4.1. Motivation
Prior RDMA protocols have been developed that provide Private Data
via out-of-band mechanisms. As a result, many applications now
expect some form of Private Data to be available for application use
prior to setting up the DDP/RDMA connection. Following are some
examples of the use of Private Data.
An RDMA endpoint (referred to as a Queue Pair, or QP, in InfiniBand
and the [VERBS-RDMA
<http://tools.ietf.org/html/rfc5044#ref-VERBS-RDMA> ]) must be
associated with a Protection Domain.
No receive operations may be posted to the endpoint before it is
associated with a Protection Domain. Indeed under both the
InfiniBand and proposed RDMA/DDP verbs [VERBS-RDMA
<http://tools.ietf.org/html/rfc5044#ref-VERBS-RDMA> ] an endpoint/QP is
created within a Protection Domain.
There are some applications where the choice of Protection Domain is
dependent upon the identity of the remote ULP client. For example,
if a user session requires multiple connections, it is highly
desirable for all of those connections to use a single Protection
Domain. Note: Use of Protection Domains is further discussed in
[RDMASEC <http://tools.ietf.org/html/rfc5044#ref-RDMASEC> ].
InfiniBand, the DAT APIs [DAT-API
<http://tools.ietf.org/html/rfc5044#ref-DAT-API> ], and the IT-API
[IT-API <http://tools.ietf.org/html/rfc5044#ref-IT-API> ] all
provide for the active-side ULP to provide Private Data when
requesting a connection. This data is passed to the ULP to allow it
to determine whether to accept the connection, and if so with which
endpoint (and implicitly which Protection Domain).
The Private Data can also be used to ensure that both ends of the
connection have configured their RDMA endpoints compatibly on such
matters as the RDMA Read capacity (see [RDMAP
<http://tools.ietf.org/html/rfc5044#ref-RDMAP> ]). Further ULP-
specific uses are also presumed, such as establishing the identity of
the client.
Private Data is also allowed for when accepting the connection, to
allow completion of any negotiation on RDMA resources and for other
ULP reasons.
There are several potential ways to exchange this Private Data. For
example, the InfiniBand specification includes a connection
management protocol that allows a small amount of Private Data to be
exchanged using datagrams before actually starting the RDMA
connection.
Culley, et al. Standards Track [Page 33]
<http://tools.ietf.org/html/rfc5044#page-34>
RFC 5044 <http://tools.ietf.org/html/rfc5044> MPA
Framing for TCP October 2007
This document allows for small amounts of Private Data to be
exchanged as part of the MPA startup sequence. The actual Private
Data fields are carried in the MPA Request Frame and the MPA Reply
Frame.
If larger amounts of Private Data or more negotiation is necessary,
TCP streaming mode messages may be exchanged prior to enabling MPA.
Culley, et al. Standards Track [Page 34]
<http://tools.ietf.org/html/rfc5044#page-35>
RFC 5044 <http://tools.ietf.org/html/rfc5044> MPA
Framing for TCP October 2007
7.1.4.2. Example Immediate Startup Using Private Data
Initiator Responder
+---------------------------+
|TCP SYN sent. | +--------------------------+
+---------------------------+ --------> |TCP gets SYN packet; |
+---------------------------+ | sends SYN-Ack. |
|TCP gets SYN-Ack | <-------- +--------------------------+
| sends Ack. |
+---------------------------+ --------> +--------------------------+
+---------------------------+ |Consumer enables MPA |
|Consumer enables MPA | |Responder mode, waits for |
|Initiator mode with | | <MPA Request frame>. |
|Private Data; MPA sends | +--------------------------+
| <MPA Request Frame>; |
|MPA waits for incoming | +--------------------------+
| <MPA Reply Frame>. | - - - - > |MPA receives |
+---------------------------+ | <MPA Request Frame>. |
|Consumer examines Private |
|Data, provides MPA with |
|return Private Data, |
|binds DDP to MPA, and |
|enables MPA to send an |
| <MPA Reply Frame>. |
|DDP/MPA enables FPDU |
+---------------------------+ |decoding, but does not |
|MPA receives the | < - - - - |send any FPDUs. |
| <MPA Reply Frame>. | +--------------------------+
|Consumer examines Private |
|Data, binds DDP to MPA, |
|and enables DDP/MPA to |
|begin Full Operation. |
|MPA sends first FPDU (as | +--------------------------+
|DDP ULPDUs become | ========> |MPA receives first FPDU. |
|available). | |MPA sends first FPDU (as |
+---------------------------+ |DDP ULPDUs become |
<====== |available). |
+--------------------------+
Figure 10: Example Immediate Startup Negotiation
Note: The exact order of when MPA is started in the TCP connection
sequence is implementation dependent; the above diagram shows one
possible sequence. Also, the Initiator "Ack" to the Responder's
"SYN-Ack" may be combined into the same TCP segment containing
the MPA Request Frame (as is allowed by TCP RFCs).
Culley, et al. Standards Track [Page 35]
<http://tools.ietf.org/html/rfc5044#page-36>
RFC 5044 <http://tools.ietf.org/html/rfc5044> MPA
Framing for TCP October 2007
The example immediate startup sequence is described below:
* The passive side (Responding Consumer) would listen on the TCP
destination port, to indicate its readiness to accept a
connection.
* The active side (Initiating Consumer) would request a
connection from a TCP endpoint (that expected to upgrade to
MPA/DDP/RDMA and expected the Private Data) to a destination
address and port.
* The Initiating Consumer would initiate a TCP connection to
the destination port. Acceptance/rejection of the connection
would proceed as per normal TCP connection establishment.
* The passive side (Responding Consumer) would receive the TCP
connection request as usual allowing normal TCP gatekeepers, such
as INETD and TCPserver, to exercise their normal
safeguard/logging functions. On acceptance of the TCP
connection, the Responding Consumer would enable MPA in the
Responder mode and wait for the initial MPA startup message.
* The Initiating Consumer would enable MPA startup in the
Initiator mode to send an initial MPA Request Frame with its
included Private Data message to send. The Initiating MPA
(and Consumer) would also wait for the MPA connection to be
accepted, and any returned Private Data.
* The Responding MPA would receive the initial MPA Request Frame
with the Private Data message and would pass the Private Data
through to the Consumer. The Consumer can then accept the
MPA/DDP connection, close the TCP connection, or reject the MPA
connection with a return message.
* To accept the connection request, the Responding Consumer would
use an appropriate API to bind the TCP/MPA connections to a DDP
endpoint, thus enabling MPA/DDP into Full Operation. In the
process of going to Full Operation, MPA sends the MPA Reply
Frame, which includes the Consumer-supplied Private Data
containing any appropriate Consumer response. MPA/DDP waits for
the first incoming FPDU before sending any FPDUs.
* If the initial TCP data was not a properly formatted MPA Request
Frame, MPA will close or reset the TCP connection immediately.
Culley, et al. Standards Track [Page 36]
<http://tools.ietf.org/html/rfc5044#page-37>
RFC 5044 <http://tools.ietf.org/html/rfc5044> MPA
Framing for TCP October 2007
* To reject the MPA connection request, the Responding Consumer
would send an MPA Reply Frame with any ULP-supplied Private Data
(with reason for rejection), with the "Rejected Connection" bit
set to '1', and may close the TCP connection.
* The Initiating MPA would receive the MPA Reply Frame with the
Private Data message and would report this message to the
Consumer, including the supplied Private Data.
If the "Rejected Connection" bit is set to a '1', MPA will
close the TCP connection and exit.
If the "Rejected Connection" bit is set to a '0', and on
determining from the MPA Reply Frame Private Data that the
connection is acceptable, the Initiating Consumer would use
an appropriate API to bind the TCP/MPA connections to a DDP
endpoint thus enabling MPA/DDP into Full Operation. MPA/DDP
would begin sending DDP messages as MPA FPDUs.
7.1.5. "Dual Stack" Implementations
MPA/DDP implementations are commonly expected to be implemented as
part of a "dual stack" architecture. One stack is the traditional
TCP stack, usually with a sockets interface API (Application
Programming Interface). The second stack is the MPA/DDP stack with
its own API, and potentially separate code or hardware to deal with
the MPA/DDP data. Of course, implementations may vary, so the
following comments are of an advisory nature only.
The use of the two stacks offers advantages:
TCP connection setup is usually done with the TCP stack. This
allows use of the usual naming and addressing mechanisms. It
also means that any mechanisms used to "harden" the connection
setup against security threats are also used when starting
MPA/DDP.
Some applications may have been originally designed for TCP, but
are "enhanced" to utilize MPA/DDP after a negotiation reveals the
capability to do so. The negotiation process takes place in
TCP's streaming mode, using the usual TCP APIs.
Some new applications, designed for RDMA or DDP, still need to
exchange some data prior to starting MPA/DDP. This exchange can
be of arbitrary length or complexity, but often consists of only
a small amount of Private Data, perhaps only a single message.
Using the TCP streaming mode for this exchange allows this to be
done using well-understood methods.
Culley, et al. Standards Track [Page 37]
<http://tools.ietf.org/html/rfc5044#page-38>
RFC 5044 <http://tools.ietf.org/html/rfc5044> MPA
Framing for TCP October 2007
The main disadvantage of using two stacks is the conversion of an
active TCP connection between them. This process must be done with
care to prevent loss of data.
To avoid some of the problems when using a "dual stack" architecture,
the following additional restrictions may be required by the
implementation:
1. Enabling the DDP/MPA stack SHOULD be done only when no incoming
stream data is expected. This is typically managed by the ULP
protocol. When following the recommended startup sequence, the
Responder side enters DDP/MPA mode, sends the last streaming mode
data, and then waits for the MPA Request Frame. No additional
streaming mode data is expected. The Initiator side ULP receives
the last streaming mode data, and then enters DDP/MPA mode.
Again, no additional streaming mode data is expected.
2. The DDP/MPA MAY provide the ability to send a "last streaming
message" as part of its Responder DDP/MPA enable function. This
allows the DDP/MPA stack to more easily manage the conversion to
DDP/MPA mode (and avoid problems with a very fast return of the
MPA Request Frame from the Initiator side).
Note: Regardless of the "stack" architecture used, TCP's rules MUST
be followed. For example, if network data is lost, re-segmented,
or re-ordered, TCP MUST recover appropriately even when this
occurs while switching stacks.
Arkady Kanevsky email: arkady at netapp.com
Network Appliance Inc. phone: 781-768-5395
1601 Trapelo Rd. - Suite 16. Fax: 781-895-1195
Waltham, MA 02451 central phone: 781-768-5300
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20071121/a631e5ff/attachment.html>
More information about the general
mailing list