[ofw][ipoib] Connected Mode changes for review.
Alex Estrin
alex.estrin at qlogic.com
Fri Oct 17 06:28:40 PDT 2008
Hello,
Here are the changes introducing ipoib connected mode implementation (as
per RFC 4755).
Source code that is quite close to these changes can be pulled from
svn://openib.tc.cornell.edu/gen1/branches/ipoib_cm to your working copy
ULP folder along with regular ipoib. Something like this will do:
svn co svn://openib.tc.cornell.edu/gen1/branches/ipoib_cm
<your-sandbox-trunk>\ulp\ipoib_cm
Build is straight forward. Code was tested on 2003 x86 and x64 with
Linux OFED 1.3.1
Some implementation points worth mentioning:
Connection:
Listening CEP associated with local endpoint.
Connection established per endpoint. CM Active side, QP
creation/destroy is offloaded to system thread (PASSIVE LVL).
Conn REQ sends along with unicast ARP reply to the endpoint if
that endpoint reports it's CM capabilities in ARP request.
Host will also accept conn REQ from the same endpoint. As a
result one or two RC QPs will be created per connection ( to match linux
behavior).
Both QPs are developed bidirectional so the first established
connection will be used for transmission.
Max payload MTU is set to 65520 (to match Linux ipoib cm MTU
size).
Since there is no HW support for RC QP yet, checksum offload
flags were forced to set: Send - disabled, receive - bypass.
Send path:
So far it implements a simple rule - if endpoint connected all
unicasts go through RC QP, the rest goes through UD QP(please see minor
issues note below).
Recv path:
endpoint recv queue attached to SRQ.
SRQ is created per port, SRQ queue size calculated using data
from ca attributes query (might need to come up with better scalable
value).
introduces new descriptor type that extends layout of UD receive
descriptor.
implemented simple ARP filter (probably should go away. please
see minor issues note)
reused existing filter for UDP/DHCP packets.
Common code changes:
added new file ipoib_cm.c (most IB CM related code was put
there).
fixed and put to use ipoib hw addr fields manipulation routines
(ipoib_xfr_mgr.h)
cm recv buffer management implemented in ipoib_endpoint.c
recv statistics update was optimized a bit for CM path.
most global definitions moved to ipoib_driver.h.
Ca attrs query packaged to a function and result is saved for
the life of the port.
added MiniportCancelSendPackets routine.
added ErrorLog messages for success/failed CM initialization.
reduced some debug print noise by moving statistic OIDs and few
other to higher level.
some minor code format, while tried to maintain consistent
project coding style.
Known major issues:
code wasn't tested yet to work properly along with LSO, so far
CM is forced to stay disabled if LSO is enabled.
code wasn't tested with "avoid the CM" patch yet.
Just figured out it is possible to recv DREQ from Linux (on it's
ARP entry expiration), while windows host ARP entry is still there, so
if application sends to linux message larger than UD payload MTU,
windows won't re-connect(first packet is not ARP) and will try to send
large message through UD QP, which obviously fail. This is not a case
for windows hosts though, where connection stays alive until endpoint
goes away.
minor issues:
SID is misformatted (IETF bit) to match Linux implementation.
Linux PR is open.
ARP reply (as well as neighbor discovery ) goes through RC QP.
(RFC says it must go through UD. I'm not sure though, what was
the reason behind this requirement).
Please review.
Thanks,
Alex.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ipoib_cm_trunk_head.diff
Type: application/octet-stream
Size: 102425 bytes
Desc: ipoib_cm_trunk_head.diff
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20081017/e465fa0c/attachment.obj>
More information about the ofw
mailing list