[openib-general] RE: [RFC] support transports whose native endpoint is not a socket
caitlinb at broadcom.com
Tue Mar 7 10:40:06 PST 2006
Or Gerlitz wrote:
> Rewinding... and starting from scratch, the idea was to
> confirm to the following scheme which is now (and ever to be) used by
> open iscsi:
> The current user space code assumes that the transport is
> using a socket and act as follows:
> +1 target discovery over TCP/IP with the discovery server
> +2 socket create/bind/setopt/connect/poll to the target +3 iscsi
> session create +4 iscsi connection create
> +5 bind iscsi connection to the socket
> +6 login request/response negotiation
> +7 iscsi connection start
> +8 the SCSI midlayer starts its inquiry and so on
> Note that no bits are sent/recv directly from/to user space
> over the socket, netlink is used to move for the kernel/user
> IPC, so to send the kernel code uses the send_pdu api and to
> recv the kernel sends up RECV_PDU event to the user.
> The problem i was trying to address is support of trasport
> whose native endpoint is not a socket nor they can move their
> endpoint from user to kernel (eg an IB QP). The approach
> suggested by the RFC i posted yesterday and today follows this.
> From your email i was not sure if you understand the current
> open iscsi implementation and assumptions and what you find
> wrong in the changes which was discussed last week and the
> RFC i sent, please elaborate on that.
The problem is that creating the iSCSI connection as a
user-mode socket, and then handing it off to the kernel
via netlink, really doesn't work unless you are really
working with a socket to provide the raw transport.
There are three basic approaches to this:
1) create a truly honest socket that integrates the
actual capabilities into the host stack fully.
That is actually a lot more work than it sounds
like because the QP model is really not all that
compatible with the socket model. The more elements
they have in common, the more their differences stand
out. Hence a QP that supports iSER/iWARP fits into a
socket model even more awkwardly than a QP that
supports iSER/IB. The iSER/iWARP "QP" actually
does have the very TCP connection that the pseudo-
socket claims to have, but the pseudo-socket is
not really a TCP connection socket.
2) Create a pseudo-socket, and try to document all
of the exceptions. You are fighting a losing battle
though, because as soon as you call it a socket you
invoke decades of accumulated expectations as to what
a socket does. Those expectations will be even greater
when the "socket" is on an IP network.
For example, the user-mode code does not just connect
with the socket, it sets some options. iSER/IB might
be justified in ignoring those options on an IB network,
but an iSCSI offload or iSER/iWARP driver would reasonably
be expected to inherit the entire state of the TCP
That means integration with the stack, or implementing
lots of the socket interface. Before long, an iSCSI offload
or iSER/iWARP driver will exporting what amounts to a full
TOE interface to the kernel.
Now for the record, I think there should be a TOE interface
in the kernel, but to engage in some mild understatement
there are some that are less enthusiastic about that.
In all fairness the TOE issue should be dealt with on
its own; support for iSCSI and iSER should not get
tangled up with it.
3) That leaves the approach taken in openib with the CMA;
basically the socket is not explicitly exposed, but the
interfaces are defined in a manner that can be easily
implemented directly over a socket -- or a QP for that
This would involve making some changes to the user-mode code.
But I believe those are simpler changes than expecting every
iSCSI solution to provide a pseudo-socket just to support
doing connect() and setting a few options.
The user-mode code does not require the full power of a socket
to meet its requirements (bind, set a few options, connect
and hand-off). In fact I would argue that using a socket
as an interface actually creates a maintenance liability
because future developers are likely to take the 'socket'
concept seriously and might try to do something that is
not supported by each of the phony sockets. The need to
create a phony socket when connecting to an iSCSI target
via TCP/IP is also horrendously counter-intutitive.
More information about the general