[openib-general] FW: [PATCH 1 of 3] mad: large RMPP support

Wed Feb 8 23:45:22 PST 2006

>I'm concerned that an allocation of a 4K buffer may fail in a situation
>where lots of small allocations of around 256 bytes would succeed.  Is
>your point that if we fail to allocate a 4K buffer, we're in deep
>trouble already?  Note that I've only considered a 1000 host cluster.

Yes - if we can't allocate a 4k buffer, it seems highly unlikely that we'd be
able to allocate 1000 256-byte buffers.

>What about scalability (e.g., 10,000 nodes -- we then need a 40K buffer)
>-- the linked list has no scalability problem (no need to push RMPP
>handling to user space).

I did consider this, and I don't know when we'll start hitting issues allocating
a single data buffer.  But we're going to ask for 10,000 256-byte buffers - over
2.5 MB of kernel memory in order to perform this single data transfer.  Is it
likely that we can allocate that much memory, but not the 40k buffer?  I really
don't know.  If the answer is yes, then I agree that using a linked list would
be better.

>Regarding the list-walk, if we track the "last-sent segment" in the
>list, there is no need to do the list walk (we simply get the next
>segment in the list).  We'll only have a short list walk when the "ack"
>pointer gets updated (need to walk forward only
><current-RMPP-ack-window-size> items in the linked list from the
>previously ack'ed item).

I thought of this as well.  For efficiency, you need to track the last sent and
last acked, meaning that the list will be walked at most twice.  You may be able
to jump the ack pointer to last sent if that is a common case.

>What is the reason you are thinking about 64-byte boundary support?

I was concerned about 64-byte values in the MADs aligned on a 32-byte boundary.
But then I think that some of the MADs have this issue anyway by architectural
design.

- Sean