[ofw] ipoib connected mode implementation details for consideration.
Alex Estrin
alex.estrin at qlogic.com
Fri Jan 16 08:12:04 PST 2009
Hello,
IPoIB Connected Mode implementation is RFC 4755 compliant.
Some limitations and behavior were imposed by Linux OFED IPoIB CM implementation.
Here are major changes I've done with existing code:
Connection:
- Connection established per endpoint.
Connect REQ, RC QP creation/destroy is offloaded to a system thread.
- Connect REQ goes along with unicast ARP reply to the endpoint
if that endpoint reported it's C/M capabilities in ARP request.
- Host also can accept Connect REQ from the same endpoint.
As a result one or two RC queue pairs will be created per connection
(that matches Linux IPoIB CM behavior).
- Listening CEP is tied to local endpoint.
Receive path:
- Endpoint recv queue is attached to SRQ.
- SRQ is created per port, SRQ queue size calculated using ca port attributes.
(ca query is done during port initialization and data saved for the live of the port).
Send path:
- Unicast IP packets go through RC QP.
- ARP, DHCP, IP multicast go through UD QP.
- added large IP packets fragmentation (for IP multicast packets > UD QP MTU).
Current Configuration details:
-Introduced new user parameter 'Connected Mode'(enabled/disabled).
-MTU size is configurable through the same 'Payload MTU size' parameter.
If Connected Mode is 'enabled' MTU range of values is extended to 65520 bytes
(to match Linux IPoIB CM default MTU size).
If CM is disabled MTU will be limited with UD range of values.
(please also look for notes below).
Major files changes:
ipoib_cm.c
-new file. Most of IB CM related code was put there.
ipoib_endpoint.c
-Most receive buffers management functions implemented there.
ipoib_port.c
-Functions __build_send_desc() and __send_mgr_filter_ip()
were reworked to handle queue pair redirection and IP fragmentation logic.
-LSO WR formatting was repackaged as a separate function __build_lso_desc()
-receive statistics update was optimized a little bit for RC path.
ipoib_port.h
- introduced new CM receive descriptor type for RC QP that extends layout
of UD receive descriptor.
- send descriptor extended to fit multiple work requests per send
(handling fragmented IP packet).
inc\kernel\ip_packet.h
- added a few macros for IP header fragment flags handling.
ipoib_xfr_mgr.h
-fixed and put to use IPoIB hw addr fields handling routines.
ipoib_driver.h.
-most global definitions moved to one spot.
Minor changes:
-added MiniportCancelSendPackets routine.
-added Error Log messages for success/failed C/M initialization.
-reduced some debug print noise by moving statistic OIDs and few other to higher level.
-some minor code format, while tried to maintain consistent project coding style.
Notes for known issues and limitations:
-Connected Mode is forced to stay disabled if LSO is enabled
( not sure how to make it work together since LSO is tied to UD ).
-SID is misformatted (IETF bit) to match Linux implementation. (Linux PR was opened).
-IP fragmentation of multicast packets > 30k will fail
(patch to fix this issue is done and testing now).
-SRQ queue size calculation may need a better algorithm.
-With Connected Mode 'enabled' checksum offload flags
will be forced to set: Send - disabled, receive - bypass.
(HCA doesn't support IP Chksum offload through RC QP).
-Code was tested on 2003 x86, x64 and with Linux OFED 1.4
Any comments and suggestions are very welcome.
Thanks,
Alex.
More information about the ofw
mailing list