[ofa-general] Re: [PATCH for-2.6.21] mthca: QP reset race fixup

Tue Mar 20 14:50:16 PDT 2007

 > This fixes openfabrics bugzilla 394:
 > - Use common EQ for command interface and async events
 > - Clean CQ after moving QP to reset

This is a little terse -- an ideal changelog entry would explain what
the bug is, what is being changed to fix it, and why that fixes the issue.
I'll try to fix it up...

 > This also fixes a potential crash in ipoib cm:
 > - sync with completion event ISR after QP is reset
 >   to prevent ULP from getting and using QP pointer and/or WRID
 >   after they are freed.

 > We can rip the lines that sync with MTHCA_EQ_COMP out if you think
 > the issue needs to be dealt with in some other way - and I agree
 > this is only good for ULPs that do all their polling
 > inside the ISR, but at least this covers all in-kernel code.

I don't really like this change, although maybe it's the right thing
to do.  But can you explain what IPoIB CM is doing that would cause it
to run into trouble?  I'd like to see if there's a better solution.
It just seems strange to me to add the assumption that destroying a QP
makes sure that all running CQ callbacks are done.

If we change to NAPI (so that CQs are polled asynchronously) does that
readd the same bug?

 > @@ -864,6 +865,11 @@ int mthca_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, int attr_mask,

 > +		if (dev->mthca_flags & MTHCA_FLAG_MSI_X)
 > +			synchronize_irq(dev->eq_table.eq[MTHCA_EQ_COMP].msi_x_vector);
 > +		else
 > +			synchronize_irq(dev->pdev->irq);

...and it seems doubly strange to have the modify QP operation wait
for CQ callbacks.

 - R.