[openib-general] SDP RDMA support for synchronous socket operations
Michael S. Tsirkin
mst at mellanox.co.il
Wed Aug 3 06:55:46 PDT 2005
Libor, all,
Tomorrow, I plan to start working on sdp zcopy support for synchronous
send/recv socket operations. Both kernel-level and user-level initiators
should continue to be supported. I dont plan to work on sendfile
support, yet.
I hope to finish the implementation and some basic testing in the coming two
weeks time. The development will be done on a branch, to be opened tomorrow.
My plan is to merge updates from trunk to stay in sync as much as possible.
What follows is a raw design draft. Comments are welcome.
MST
------------------------------------------------------------------------
What needs to be done to support zcopy for synchronous send/recv socket
operations.
Draft rev 2
Currently only AIO is supported for Zcopy.
We reuse the ZCopy infrastructure for send/recv socket operations.
Main differences between AIO and send/recv operations:
- send/recv have more flags: MSG_DONTWAIT, MSG_OOB, MSG_WAITALL, MSG_PEEK.
- after a send/recv call returns, its illegal for the HCA to touch the
application's buffer.
- send/recv must support data sizes too big to be transferred in
one infiniband operation (SOCK_STREAM applications dont seem
to expect to get EMSGSIZE).
- send/recv have the ability to block until an operation completes.
This has to be implemented by SDP.
- with send/recv, operation is revoked with a signal, unlike aio which
is canceled explicitly by the application.
- typically, there is only one outstanding send/recv operation on a
specific socket
- send/recv must be supported for kernel and user-space consumers.
current aio code seems to only support user-level consumers.
Design draft covers: Send side, Receive side, Send/Send deadlock prevention:
-----------------------
Send side:
Operation:
- Attempt zcopy if the message is bigger than send bcopy threshold
- If the operation is too big to fit in a single FMR,
split it to multiple buffers (iocb), queue them for processing.
Q: limit the number of FMRs used by a single socket?
Block till the last iocb completes.
- If no FMRs are available
Force bcopy transfer
- On signal, locate and cancel all queued iocbs.
This may need to block, in which case we block in
uninterruptible state (with a timeout)
If iocbs cant be canceled within a predefined time,
treat this as a transport error, trigger an abortive close
Options/Socket flags:
- MSG_OOB out of band data
Force bcopy transfer
Q: What to do if src avail are outstanding?
A:
- MSG_DONTWAIT/O_NONBLOCK non-blocking operation
Force bcopy transfer
-----------------------
Receive side:
Operation:
- Attempt zcopy if the message is bigger than rcv bcopy threshold
- If the operation is too big to fit in a single FMR,
split it to multiple buffers (iocb), queue them for processing.
Block till the last iocb completes.
- With MSG_WAITALL:
Dont post sink available.
- Without MSG_WAITALL:
Post exactly one sink available at a time.
Registration can still be pipelined with RDMA.
Note that recv without MSG_WAITALL may return a shorter message
than what was sent. This is OK:
"For stream-based sockets, such as SOCK_STREAM, message boundaries shall
be ignored. In this case, data shall be returned to the user as soon as it
becomes available, and no data shall be discarded."
- If no FMRs are available
Force bcopy transfer
- On signal, locate and cancel all queued iocbs
This may need to block, in which case we block in
uninterruptible state (with a timeout)
If iocbs cant be canceled within a predefined time,
treat this as a transport error, trigger an abortive close
Options/Socket flags:
- MSG_OOB out of band data
Handle as it arrives
Q: Should we force bcopy transfer (SendSM)?
A:
- MSG_DONTWAIT/O_NONBLOCK non-blocking operation
Force bcopy transfer
- MSG_PEEK peek data in buffer
Force bcopy transfer
-----------------------
Send/Send deadlock prevention: quoting SDP spec:
Receive side detects deadlock if:
. A SrcAvail is received; and
. No ULP receive buffer is posted; and
. The local Data Source has a SrcAvail outstanding.
There are several ways to resolve this deadlock.
Resolve in the following way:
. The Data Sink could send a SendSm message to force the use of
the Bcopy data transfer mechanism.
--
MST
More information about the general
mailing list