[openib-general] RE: [PATCH 3 of 3] mad: large RMPP support, Round 2

Jack Morgenstein jackm at mellanox.co.il
Sun Feb 19 02:42:59 PST 2006


Thanks for agreeing to check in the patch!

Feedback below (see my inserted comments bracketed by >>>>>>>>>>>>>> /
<<<<<<<<<<<< and prefixed by "[JPM]").

Jack
-----Original Message-----
From: Sean Hefty [mailto:sean.hefty at intel.com] 
Sent: Saturday, February 18, 2006 7:40 PM
To: Jack Morgenstein
Cc: Michael S. Tsirkin; openib-general at openib.org
Subject: RE: [PATCH 3 of 3] mad: large RMPP support, Round 2

>+static inline void adjust_last_ack(struct ib_mad_send_wr_private *wr)
>+{
>+	struct ib_mad_multipacket_seg *seg;
>+
>+	if (wr->last_ack < 2)
>+		return;
>+	else if (!wr->last_ack_seg)
>+		list_for_each_entry(seg, &wr->multipacket_list, list) {
>+			if (wr->last_ack == seg->num) {
>+				wr->last_ack_seg = seg;
>+				break;
>+			}
>+		}
>+	else
>+		list_for_each_entry(seg, &wr->last_ack_seg->list, list)
{
>+			if (wr->last_ack == seg->num) {
>+				wr->last_ack_seg = seg;
>+				break;
>+			}
>+		}
>+}

If we initialize last_ack_seg to the start of the list, can we combine
the else
if and else checks together?

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
[JPM] This is a problem, since we will then always skip over segment 2
(list_for_each_entry starts with taking the next segment).  If we
initialize
so that the last_seg scan will start with the first rmpp "overflow"
segment
(seg 2), we'll need to allocate a dummy initial segment just so that we
can reference &wr->last_ack_seg->list -- seems more conplicated to me.
This way, last_ack_seg gets initialized painlessly.
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

>@@ -647,6 +672,7 @@ static void process_rmpp_ack(struct ib_m
>
> 	if (seg_num > mad_send_wr->last_ack) {
> 		mad_send_wr->last_ack = seg_num;
>+		adjust_last_ack(mad_send_wr);

If last_ack_seg references a segment that contains the seg_num, can we
eliminate
last_ack?


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
[JPM]:  No, because the seg-num ack'ed may be segment number 1 (which
has
no entry in the multipacket list).  The last_ack_seg pointer's job is
only
to avoid the O(n-squared) search in scanning the segment list.  The RMPP
logic uses last_ack (see, for example abort_send() and ib_retry_rmpp()
in the same file.
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

>+static inline int alloc_send_rmpp_segs(struct ib_mad_send_wr_private
*send_wr,
>+				       int message_size, int hdr_len,
>+				       int data_len, u8 rmpp_version,
>+				       gfp_t gfp_mask)
>+{
>+	struct ib_mad_multipacket_seg *seg;
>+	struct ib_rmpp_mad *rmpp_mad = send_wr->send_buf.mad;
>+	int seg_size, i = 2;
>+
>+	rmpp_mad->rmpp_hdr.paylen_newwin =
>+			cpu_to_be32(hdr_len - IB_MGMT_RMPP_HDR +
data_len);
>+	rmpp_mad->rmpp_hdr.rmpp_version = rmpp_version;
>+	rmpp_mad->rmpp_hdr.rmpp_type = IB_MGMT_RMPP_TYPE_DATA;
>+	ib_set_rmpp_flags(&rmpp_mad->rmpp_hdr,
IB_MGMT_RMPP_FLAG_ACTIVE);
>+	send_wr->total_length = message_size;
>+	/* allocate RMPP buffers */
>+	message_size -= sizeof(struct ib_mad);
>+	seg_size = sizeof(struct ib_mad) - hdr_len;
>+	while (message_size > 0) {
>+		seg = kmalloc(sizeof(struct ib_mad_multipacket_seg) +
seg_size,
>+				     gfp_mask);

It would be convenient if the MAD were cleared for the user, so we don't
end of
transferring random data, especially at the end of the user data.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[JPM]: This is done in file user_mad.c, file ib_umad_write():

See patch 3 of 3, hunk starting with:
@@ -502,14 +510,32 @@ static ssize_t ib_umad_write(struct file

following lines:
+		/* Pad last segment with zeroes. */
+		if (seg->size - s)
+			memset(seg->data + s, 0, seg->size - s);

I did it this way to avoid using kzalloc for all segment allocations, or
kmalloc/memset for all.  This way, only the last segment is "memset"'ed
and only in the padding area.

We can probably get rid of the "if", and just do the padding, with
possibly zero bytes to be padded.
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

>+		if (!seg) {
>+			printk(KERN_ERR "ib_create_send_mad: RMPP mem "
>+			       "alloc failed for len %zd, gfp %#x\n",
>+			       sizeof(struct ib_mad_multipacket_seg) +
seg_size,
>+			       gfp_mask);
>+			free_send_multipacket_list(send_wr);
>+			return -ENOMEM;
>+		}
>+		seg->size = seg_size;

Okay, seg_size is the same for all segments belonging to a single MAD.
We can
move this it ib_mad_send_buf.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
[JPM]: This was done to hide segment implementation details from the
user_mad layer. You are correct that we can get rid of the size field,
and store it only once.
However, I think that the proper place for this value is the
"ib_mad_send_wr_private" structure, which also holds the list pointer.

How about if we change 
struct ib_mad_multipacket_seg
*ib_mad_get_multipacket_seg(struct ib_mad_send_buf *send_buf, int
seg_num)

to also return the segment size:

struct ib_mad_multipacket_seg
*ib_mad_get_multipacket_seg(struct ib_mad_send_buf *send_buf, int
seg_num,
					int *seg_size)
?

The mad layer (which has access to the private structure) can then
return the segment size still without exposing implementation details.
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

>+	mad_send_wr->last_ack_seg = NULL;
>+	mad_send_wr->seg_num_seg = NULL;

Mad_send_wr is cleared on allocation.  Are there better values to
initialize
these variables to?  The first/second segment?

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
[JPM] IMHO, it is cleaner initialize to NULL.  However, as you point
out, these two initialization lines are redundant.
Furthermore, there may be no segment to initialize to (if this is a
non-rmpp MAD, or a single-segment RMPP mad).  
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

>+struct ib_mad_multipacket_seg
>+*ib_rmpp_get_multipacket_seg(struct ib_mad_send_wr_private *wr, int
seg_num)
>+{
>+	struct ib_mad_multipacket_seg *seg;
>+
>+	if (seg_num == 2) {
>+		wr->seg_num_seg =
>+			container_of(wr->multipacket_list.next,
>+				     struct ib_mad_multipacket_seg,
list);
>+		return wr->seg_num_seg;
>+	}
>+
>+	/* get first list entry if was not already done */
>+	if (!wr->seg_num_seg)

See previous comment.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
[JPM]:  On examination, can just get rid of the following lines:
+	if (seg_num == 2) {
+		wr->seg_num_seg =
+			container_of(wr->multipacket_list.next,
+				     struct ib_mad_multipacket_seg,
list);
+		return wr->seg_num_seg;
+	}
+
(i.e., just initialize if seg_num_seg is null.  If seg_num = 2, the
patch lines below will handle this case:
+	if (wr->seg_num_seg->num == seg_num)
+		return wr->seg_num_seg;
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

>+struct ib_mad_multipacket_seg
>+*ib_mad_get_multipacket_seg(struct ib_mad_send_buf *send_buf, int
seg_num)
>+{
>+	struct ib_mad_send_wr_private *wr;
>+
>+	if (seg_num < 2)
>+		return NULL;
>+
>+	wr = container_of(send_buf, struct ib_mad_send_wr_private,
send_buf);
>+	return ib_rmpp_get_multipacket_seg(wr, seg_num);
>+}

Treating the first segment special seems somewhat confusing.  (Maybe
this is a
result of how the MAD/RMPP header is copied down from userspace?)

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
[JPM]
Its confusing either way.  At least this way we don't allocate an extra
(first) segment when using RMPP.  The initial 256 bytes of MAD get
allocated anyway:
In procedure ib_create_send_mad(), you will note that for all MADS, the
initial allocate (line 851) is for the ib_mad_send_wr_private element
plus
a MAD packet (256 bytes).  Later, if needed for RMPP, additional
segments are allocated.  To treat regular MADs one way, and RMPP fully
another way, the size of the initial allocation would need to be
conditional on whether this is RMPP or not -- which complicates the
code.
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

>+	if (!rmpp_active && length > sizeof(struct ib_mad)) {
>+		ret = -EINVAL;
>+		goto err_ah;
>+	}
>+
> 	packet->msg = ib_create_send_mad(agent,
>
be32_to_cpu(packet->mad.hdr.qpn),
> 					 0, rmpp_active,

I think that ib_create_send_mad performs the same check.

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
[JPM]:  You are correct.  In fact, the test just before this one is also
performed in ib_create_send_mad(), and may also be deleted (this was not
part of the patch):

	if (rmpp_active && !agent->rmpp_version) {
		ret = -EINVAL;
		goto err_ah;
	}
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<


- Sean



More information about the general mailing list