[openib-general] FW: [PATCH 1 of 3] mad: large RMPP support
Jack Morgenstein
jackm at mellanox.co.il
Wed Feb 8 08:28:39 PST 2006
Sorry for breaking the thread (Outlook is problematic).
Jack
-----Original Message-----
From: Jack Morgenstein
Sent: Wednesday, February 08, 2006 6:23 PM
To: 'Sean Hefty'
Cc: Michael S. Tsirkin; 'rolandd at cisco.com'
Subject: RE: [PATCH 1 of 3] mad: large RMPP support
Sorry for not echoing to openib -- I'm having problems with mutt and our
server (replying to this from Outlook will not place the reply in the
thread).
I would much rather use the linked list.
We may need to allocate a rather large contiguous array (ib_mad_segments
segment array) for queries involving a large cluster, and such an
allocation has a larger probability of failure.
For example, a 1000 host cluster, with 2 ports per HCA will have at
least 4000 records in a SubnAdmGetTableResp for all PortInfo records on
the network (2000 for HCAs, and at least 2000 for the switch ports).
Such a query response will generate an RMPP of size 256K -- 1000
segments, or a 4K buffer on an X86 machine just for the array (assuming
one allocation per RMPP segment -- N=1).
b. Regarding using buffers which contain N RMPP segments, this becomes a
management nightmare:
If choose N too large, we may fail to allocate segments in a
large RMPP, so that the entire RMPP fails (where it could succeed if
N=1). Having N=1 guarantees that if we can succeed in our allocation,
we will. I do not consider variable-size N within a single RMPP, since
this will be very complicatated and error-prone.
We could re-allocate everything if some N does not work -- also very
complex.
Regarding the order N-squared algorithm for finding the next RMPP
segment to send, MST and I agree that this is not acceptable. We are
considering an algorithm which stores the current segment pointer in
"struct ib_mad_send_wr_private" so that when getting the next segment we
simply go to the "next" link. We're still ironing out proper handling
of the "last acknowledged" processing (maintaining a pointer to the
last-acked segment, upgrading the last-acked pointer when a new ack
arrives -- this might still involve linear searches).
Regarding the payload pointer, I agree. It is also trivial to move it to
the ib_mad_send_wr_private structure, hiding it from the user.
Regarding the 64-byte boundary, why is this important?
Jack
-----Original Message-----
From: Sean Hefty [mailto:sean.hefty at intel.com]
Sent: Wednesday, February 08, 2006 3:01 AM
To: Jack Morgenstein
Cc: openib-general at openib.org
Subject: RE: [PATCH 1 of 3] mad: large RMPP support
Based on what you've done, I'd like to suggest changing interface
similar to
that shown below. I believe that this could be done with minor changes
to the
current patches. Detailed comments that led to suggesting this change
are
inline in my responses.
struct ib_mad_segments {
u32 num_segments;
u32 segment_size;
void *segment[0];
};
struct ib_mad_send_buf {
...
void *mad; /* First MAD segment */
struct ib_mad_segments *segments; /* RMPP segments > 1 */
...
};
This will avoid walking through a list to find segments, and allows for
efficient allocation of the segment data buffers. Multiple segments
could be
allocated through a single kzalloc. (For example, every n-th segment
would
start a new allocation, making deallocation easy as well.)
>+struct ib_mad_multipacket_seg {
>+ struct list_head list;
>+ u32 size;
>+ u8 data[0];
>+};
Should we ensure that the data alignment is on a 64-byte boundary?
> struct ib_mad_send_buf {
> struct ib_mad_send_buf *next;
>- void *mad;
>+ void *mad; /* RMPP: first segment,
>+ including the MAD header */
>+ void *mad_payload; /* RMPP: changed per
segment */
Mad_payload doesn't appear to be directly accessible directly by the
user. It
should be hidden.
- Sean
More information about the general
mailing list