[openib-general] FW: [PATCH 1 of 3] mad: large RMPP support

Jack Morgenstein jackm at mellanox.co.il
Wed Feb 8 23:23:34 PST 2006


My point was not the total storage used for the array (it ends up more
than the linked list, as you noted).

I'm concerned that an allocation of a 4K buffer may fail in a situation
where lots of small allocations of around 256 bytes would succeed.  Is
your point that if we fail to allocate a 4K buffer, we're in deep
trouble already?  Note that I've only considered a 1000 host cluster.
What about scalability (e.g., 10,000 nodes -- we then need a 40K buffer)
-- the linked list has no scalability problem (no need to push RMPP
handling to user space).

Regarding the list-walk, if we track the "last-sent segment" in the
list, there is no need to do the list walk (we simply get the next
segment in the list).  We'll only have a short list walk when the "ack"
pointer gets updated (need to walk forward only
<current-RMPP-ack-window-size> items in the linked list from the
previously ack'ed item).

--
What is the reason you are thinking about 64-byte boundary support?

Jack


-----Original Message-----
From: Sean Hefty [mailto:sean.hefty at intel.com] 
Sent: Wednesday, February 08, 2006 7:13 PM
To: Jack Morgenstein; openib-general at openib.org
Subject: RE: [openib-general] FW: [PATCH 1 of 3] mad: large RMPP support

>For example, a 1000 host cluster, with 2 ports per HCA will have at
>least 4000 records in a SubnAdmGetTableResp for all PortInfo records on
>the network (2000 for HCAs, and at least 2000 for the switch ports).
>Such a query response will generate an RMPP of size 256K -- 1000
>segments, or a 4K buffer on an X86 machine just for the array (assuming
>one allocation per RMPP segment -- N=1).

I think that this is a good reason to use an array.  Walking a 1000
entry list
1000 times is a substantial performance hit.  Lost MADs and retries will
make
this worse.

A 4K buffer for the array is less than the 8K total needed for the 1000
list
items.  We're already talking about allocating over 256K of memory just
for the
data payload.  An additional contiguous 4k buffer seems like a minor
issue.  I'm
not convinced that there's a real issue here.

To support ridiculously large transfers from userspace, we may need to
push the
RMPP handling up into userspace.

- Sean



More information about the general mailing list