[ofa-general] Verbs: IB vs. iWARP

Thu May 8 14:28:46 PDT 2008

Over the past 24 hours, we assembled a list of differences between IB  
and iWARP usage of verbs.  I got a few comments on the text we  
assembled, and figured it was time to turn this text over to  
OpenFabrics to make it fully correct/complete/whatever, and then  
publish it however you see fit.

I hope this starter text is helpful to you; enjoy.

-----
  * struct ib_device.transport_type will be IBV_TRANSPORT_IWARP for  
iWARP devices and IBV_TRANSPORT_IB for IB devices.

  * ibv_query_gid():
    * When invoked on an IB HCA, will return the IB subnet prefix in  
subnet_prefix and GUID of the port in the interface_id.
    * When invoked on an iWARP NIC, will return the NIC's MAC address  
in subnet_prefix and 0 in the interface_id.

  * iWARP QPs ''must'' be made with the RDMA CM; IB QPs can be made  
using the IB CM, RDMA CM, or some other (assumedly out-of-band)  
mechanism.

  * When making QPs, some versions of iWARP drivers require the  
initiator of the connection to send the first message (having the non- 
initiator send the first message will terminate the connection).   
Newer versions of iWARP firmware/drivers hide this functionality down  
in the driver, so the ULP doesn't have to ensure that the initiator  
sends the first message.

  * When terminating connections via the RDMA CM (via the  
rdma_disconnect() call or by simply destroying the QP without  
disconnecting first), iWARP transports will automatically create a CQE  
for any pending send or receive WRs with the status set to  
IBV_WC_WR_FLUSH_ERR.  Note that IB HCAs do the same thing, but the  
iWARP RDMA CM disconnection progresses independently of the ULP,  
meaning that when one side issues the disconnect, the other side will  
automatically be disconnected (even if the ULP doesn't realize it).   
IB HCAs may not process the disconnect until later (via RDMA CM or  
otherwise), perhaps not until the ULP realizes that the disconnect has  
occurred.  In short: device-independent verbs-based applications need  
to be able to handle FLUSH WRs during disconnection and not treat them  
as an error.

  * LIDs are always 0 in iWARP.

  * LMC is always 0 for iWARP.

  * Memory regions used to receive RDMA read responses must have  
"remote write" permission (since in the iWARP protocol, RDMA read  
responses are basically the same as incoming RDMA write requests).

  * Atomics and immediate data are not available in iWARP.

  * The sink scatter-gather list for an RDMA read can only have one  
element for iWARP (which is reported accurately in struct  
ibv_device.max_sge).

  * Send completions provide a slightly different guarantee:
    * iWARP: indicates that the resources in the corresponding WR can  
be reused; it does ''not'' indicate that the data is in the peer's  
memory, or even that they have been transmitted yet.
    * IB: indicates that the data has been transmitted and has arrived  
at the remote HCA (but is not necessarily in the remote target buffer  
yet)

  * All currently-available RNICs (May 2008) do not support RNR  
retry.  Specifically: current RNICs will terminate a QP connection if  
a SEND arrives with no corresponding pre-posted receive.

-- 
Jeff Squyres
Cisco Systems