[ofa-general] Re: [PATCH] mlx4_ib: fix for bugzilla 1383 (LSO packet processing)
Roland Dreier
rdreier at cisco.com
Mon Dec 29 14:02:30 PST 2008
> mlx4_ib: fix for Bugzilla 1383 (LSO packet processing).
<grumble> having this duplicate of the subject line just means if I use
git tools to import it, I have to delete it by hand.
I would love it if everyone tried feeding the emails they send into "git
am" and then examined the result, and fixed things until the git tree
that you get looks right.
And in this case the subject line is particularly useless to someone
reading the shortlog. It would be much better to say what the patch
actually does in the subject line, and then say something like
This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1383>
in the body of the email. (Otherwise how could anyone not totally
familiar with IB development know which bugzilla to look in?)
Anyway, I guess the bug is that build_lso_seg() overwrites the stamping
and allows the WQE to start executing before the gather list is written?
The fix looks kind of ugly:
> @@ -1606,13 +1605,21 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
> size += sizeof (struct mlx4_wqe_datagram_seg) / 16;
>
> if (wr->opcode == IB_WR_LSO) {
> - err = build_lso_seg(wqe, wr, qp, &seglen);
> + err = build_lso_seg(wqe, wr, qp, &seglen, &lso_hdr_sz);
> if (unlikely(err)) {
> *bad_wr = wr;
> goto out;
> }
> + lso_wqe = (__be32 *) wqe;
> wqe += seglen;
> - size += seglen / 16;
> + dseg = wqe;
> + dseg += wr->num_sge - 1;
> + size += (seglen / 16) + wr->num_sge *
> + (sizeof (struct mlx4_wqe_data_seg) / 16);
> + for (i = wr->num_sge - 1; i >= 0; --i, --dseg)
> + set_data_seg(dseg, wr->sg_list + i);
> + *lso_wqe = lso_hdr_sz;
> + goto lso_continue;
rather than duplicating all this code and adding the ugly goto, can we
just add another "if (wr->opcode == IB_WR_LSO)" after writing the data
segments and do the *lso_wqe = lso_hdr_sz there?
Also is your code missing a memory barrier between the set_data_seg()
loop and the lso_wqe assignment? It seems that an out-of-order CPU
could make the lso_wqe visible before all the data segments are visible,
so the bug could show up there anyway.
- R.
More information about the general
mailing list