[openib-general] [PATCH] added comments to ib_mad.h - minor update

Sat Aug 7 08:55:49 PDT 2004

On Sat, 07 Aug 2004 11:06:16 -0400
Hal Rosenstock <halr at voltaire.com> wrote:

> I'm unconvinced about so called "zero copy" RMPP. Someone has to do the
> fragmentation/reassembly. Seems to me that should be hidden by the
> access layer rather than exposed to the consumer. I think this is the
> fundamental issue to resolve for RMPP. 

I think that we can separate segmentation from reassembly.

For segmentation, we can definitely do zero-copies.  And zero-copy shouldn't be an issue for non-RMPP MADs.

For reassembly, this is harder because of how the MAD headers are defined.  The standard and RMPP MAD headers are duplicated in every segment.  The result is that in order to do zero-copy reassembly, the user needs to get back a chain of buffers.  (I'm refering to kernel-only clients here.  For user-space, there's no reason not to give the user a reassembled MAD in a single buffer.)  I think that the issue here is that the API becomes cludgy, hard to define, and difficult to work with.  Plus the data that the user cares about is now sprinkled throughout multiple buffers, but offset into those buffers sizeof(grh) + sizeof(mad header) + sizeof(rmpp header).

Based on the API of the original GSI proposal, it appeared that it was trying to provide zero-copy reassembly.  I'm open to reassembly requiring a single data copy however.

> Not sure the consumer should need to set all fields to 0 when RMPPActive
> is not set. The access layer might be better to do this to be sure. 

I think that the client could do this more efficiently.  The access layer would need to do this on every send, whereas the client could do it once for multiple transfers.

> I was thinking about the model where RMPP performs the coalescing on the
> receive side in which case I think this helps as the segments can be
> copied and reused sooner. 

Something to consider is that the spec permits sending an RMPP packet of unknown length (PayloadLength = 0 in the first segment).  This makes it difficult to coalesce into a single buffer when receiving a segment, because the size of the buffer isn't known until the last segment has been received.

A benefit of coalescing the data into a single buffer is that it decreases memory use, since we can avoid carrying around the duplicated GRH and MAD/RMPP headers.

> Yes, that calculation is based on a set of assumptions which are documented in the spec. 
> While it is easier to use some hard coded value rather than a dynamically calculated one, 
> it also lends to longer timeouts when a RMPP packet is dropped somewhere.

Here's a problem that I see with the dynamic calcations.  The GSI is sitting around when it *receives* the first segment of an RMPP packet.  According to the spec, it now has to figure out the PayloadLength (which could be set to 0, in which case it just uses a default), figure out the packet lifetime from the sender to itself, get the packet lifetime from itself to the sender, and know what it's own response time value is going to be (which should be set by the client, not the GSI).

By the time the GSI figures all these values out, the RMPP transfer is either going to be done, or have timed out on the sender side...

Anyway, I'd like to make the receive timeouts dynamic and client controlled.  We just need a good way to do it.

> What is used to indicate send only v. send and (RMPP) response expected ? 

The timeout_ms field in ib_mad_send_wr indicates if a response is expected.  When sending, if RMPPActive is set, the send will use RMPP.  After the send completes, if timeout_ms is set, then a response is expected.  On the received response, if the sender uses RMPP (set when calling ib_mad_reg()), the GSI will look in the RMPP header to see if RMPP is active for the receive. 

> Did the SF RMPP use SA GetMulti which is where this is used ?

The SF RMPP did support GetMulti.  It was only class aware in a few cases, such as the CM and trap repress messages.  I mentioned this before, but looking at the proposed GSI implementation, it copied the SF RMPP code.