[openib-general] Race in mthca_cmd_post()

Roland Dreier rdreier at cisco.com
Fri Oct 13 12:40:15 PDT 2006


    John> I put a PCI-X analyzer on the bus along with the HCA and
    John> found that we saw a Memory Read to register 698 but no
    John> evidence of the SYS_EN command making it down to the
    John> card. We were trying to read the gobit before the DOORBELL
    John> had completed. I think this could only happen on multi CPU
    John> machines with fast CPU's and an architecture which does not
    John> do PIO ordering very well.

Forgive me for being dense, but could you describe the race a little
more precisely?  Like list out what's happening and point to where two
things happen out of order?  What confuses me is that you seem to be
saying that a read of PCI MMIO space is racing with a write -- and I
would have thought that a read has to flush all posted writes.

And I really don't understand how readl() could panic the kernel
anyway -- the MMIO region should be set up properly long before, so
what is sn_dma_flush() crashing on?

Also a couple of comments on the patch itself:

 > -       if (event && dev->cmd.flags & MTHCA_CMD_POST_DOORBELLS && fw_cmd_doorbell)
 > +       if (event && dev->cmd.flags & MTHCA_CMD_POST_DOORBELLS && fw_cmd_doorbell) {
 > +               mmiowb();
 >                  mthca_cmd_post_dbell(dev, in_param, out_param, in_modifier,
 >                                             op_modifier, op, token);
 > -       else
 > +       } else {
 > +               mmiowb();
 >                  err = mthca_cmd_post_hcr(dev, in_param, out_param, in_modifier,
 >                                           op_modifier, op, token, event);
 > +       }

First and most trivial, you indented with spaces instead of tabs.
It's always obvious that a patch that looks like

 	old_code();
+        new_code();

has whitespace damage, because things don't line up.

Also, why did you do

	if (something)
		mmiowb()
		...
	else
		mmiowb()
		...

instead of just

	mmiowb()
	if (something)
		...
	else
		...

Finally, if you're going to add a mmiowb() (and I don't doubt it is
necessary, I'd just like to understand why exactly) then you need to
add a big comment explaining what it protects -- mmiowb() is always
confusing to people so it's important to have an explanation so that
future changes don't break the driver again.

 - R.




More information about the general mailing list