[openib-general] RE: [RFC] support transports whose native endpoint is not a socket

Tue Mar 7 10:40:06 PST 2006

Or Gerlitz wrote:
> Rewinding... and starting from scratch, the idea was to
> confirm to the following scheme which is now (and ever to be) used by
> open iscsi: 
> 
> The current user space code assumes that the transport is
> using a socket and act as follows:
> 
> +1 target discovery over TCP/IP with the discovery server
> +2 socket create/bind/setopt/connect/poll to the target +3 iscsi
> session create +4 iscsi connection create
> +5 bind iscsi connection to the socket
> +6 login request/response negotiation
> +7 iscsi connection start
> +8 the SCSI midlayer starts its inquiry and so on
> 
> Note that no bits are sent/recv directly from/to user space
> over the socket, netlink is used to move for the kernel/user
> IPC, so to send the kernel code uses the send_pdu api and to
> recv the kernel sends up RECV_PDU event to the user.
> 
> The problem i was trying to address is support of trasport
> whose native endpoint is not a socket nor they can move their
> endpoint from user to kernel (eg an IB QP). The approach
> suggested by the RFC i posted yesterday and today follows this.
> 
> From your email i was not sure if you understand the current
> open iscsi implementation and assumptions and what you find
> wrong in the changes which was discussed last week and the
> RFC i sent, please elaborate on that.

The problem is that creating the iSCSI connection as a
user-mode socket, and then handing it off to the kernel
via netlink, really doesn't work unless you are really
working with a socket to provide the raw transport.

There are three basic approaches to this:
1) create a truly honest socket that integrates the
   actual capabilities into the host stack fully.
   That is actually a lot more work than it sounds
   like because the QP model is really not all that
   compatible with the socket model. The more elements
   they have in common, the more their differences stand
   out. Hence a QP that supports iSER/iWARP fits into a
   socket model even more awkwardly than a QP that
   supports iSER/IB. The iSER/iWARP "QP" actually 
   does have the very TCP connection that the pseudo-
   socket claims to have, but the pseudo-socket is 
   not really a TCP connection socket.

2) Create a pseudo-socket, and try to document all
   of the exceptions. You are fighting a losing battle
   though, because as soon as you call it a socket you
   invoke decades of accumulated expectations as to what
   a socket does. Those expectations will be even greater
   when the "socket" is on an IP network.

   For example, the user-mode code does not just connect
   with the socket, it sets some options. iSER/IB might
   be justified in ignoring those options on an IB network,
   but an iSCSI offload or iSER/iWARP driver would reasonably
   be expected to inherit the entire state of the TCP 
   connection.

   That means integration with the stack, or implementing
   lots of the socket interface. Before long, an iSCSI offload
   or iSER/iWARP driver will exporting what amounts to a full
   TOE interface to the kernel.

   Now for the record, I think there should be a TOE interface
   in the kernel, but to engage in some mild understatement
   there are some that are less enthusiastic about that.
   In all fairness the TOE issue should be dealt with on
   its own; support for iSCSI and iSER should not get 
   tangled up with it.

3) That leaves the approach taken in openib with the CMA;
   basically the socket is not explicitly exposed, but the
   interfaces are defined in a manner that can be easily
   implemented directly over a socket -- or a QP for that
   matter.

This would involve making some changes to the user-mode code.
But I believe those are simpler changes than expecting every
iSCSI solution to provide a pseudo-socket just to support
doing connect() and setting a few options.

The user-mode code does not require the full power of a socket
to meet its requirements (bind, set a few options, connect
and hand-off). In fact I would argue that using a socket
as an interface actually creates a maintenance liability
because future developers are likely to take the 'socket'
concept seriously and might try to do something that is
not supported by each of the phony sockets. The need to
create a phony socket when connecting to an iSCSI target
via TCP/IP is also horrendously counter-intutitive.