[openib-general] Race in mthca_cmd_post()

John Partridge johnip at sgi.com
Fri Oct 13 14:45:55 PDT 2006


Roland,

Thanks for looking at this, I appologize for not giving enough detail,
I'm never too sure how much to post.

Roland Dreier wrote:
>     John> I put a PCI-X analyzer on the bus along with the HCA and
>     John> found that we saw a Memory Read to register 698 but no
>     John> evidence of the SYS_EN command making it down to the
>     John> card. We were trying to read the gobit before the DOORBELL
>     John> had completed. I think this could only happen on multi CPU
>     John> machines with fast CPU's and an architecture which does not
>     John> do PIO ordering very well.
> 
> Forgive me for being dense, but could you describe the race a little
> more precisely?  Like list out what's happening and point to where two
> things happen out of order?  What confuses me is that you seem to be
> saying that a read of PCI MMIO space is racing with a write -- and I
> would have thought that a read has to flush all posted writes.

I was confused by this, but I believe that the readl to reg 698 is not
completing because the DDR memory is not yet available because SYS_EN never
got down to the card before the readl, or did not complete before readl.
This could be wrong I don't know. I was expecting to see a PIO write to reg 680
(which I can't find in the analyzer trace). What I did see is a lot of Config
Write's to the PCI config space and then a Memeory read (to 698).

Here is a snapshot of the trace :-

23438: * Config Write  REG = 01  TYPE = 1 Bus1 Device0 Func0
23439: BE = 0000 Requester = Bus0 Device0 Func0  Tag = 1
23440: <Decoding> (no DEVSEL#, no IRDY#)
23441: <Response>
23442: GAP of 2 line(s)
23443: <TERM>   -RETRY
23444: <TERM>
23445: GAP of 24 line(s)
23446: * Config Write  REG = 01  TYPE = 1 Bus1 Device0 Func0
23447: BE = 0000 Requester = Bus0 Device0 Func0  Tag = 1
23448: <Decoding> (no DEVSEL#, no IRDY#)
23449: <Response>
23450: GAP of 2 line(s)
23451: Data = 02300107   -SPLIT RESPONSE
23452: <Recover>
23453: GAP of 3 line(s)
23454: * Memory Rd DW  Addr = 00280698
23455: BE = 0000 Requester = Bus0 Device0 Func0  Tag = 0
23456: <Decoding> (no DEVSEL#, no IRDY#)
23457: <Response>
23458: GAP of 2 line(s)
23459: Data = ffffffff   -SPLIT RESPONSE
23460: <Recover>
23461: GAP of 7 line(s)
23462: * Split compl.     (Requester = Bus0 Device0 Func0) Tag = 0
23463: Completer = Bus0 Device2 Func0  (Error completion)
23464: <Response>
23465: Data = 10000004
23466: <Recover>


> 
> And I really don't understand how readl() could panic the kernel
> anyway -- the MMIO region should be set up properly long before, so
> what is sn_dma_flush() crashing on?

The panic occurs because of the Memory read error completion, the read probably
fails because the card is not ready enabled ??

I don't know about other architectures, but on ours a PIO read failing will cause
a panic (maybe a more correct description would be MCA)

> 
> Also a couple of comments on the patch itself:
> 
>  > -       if (event && dev->cmd.flags & MTHCA_CMD_POST_DOORBELLS && fw_cmd_doorbell)
>  > +       if (event && dev->cmd.flags & MTHCA_CMD_POST_DOORBELLS && fw_cmd_doorbell) {
>  > +               mmiowb();
>  >                  mthca_cmd_post_dbell(dev, in_param, out_param, in_modifier,
>  >                                             op_modifier, op, token);
>  > -       else
>  > +       } else {
>  > +               mmiowb();
>  >                  err = mthca_cmd_post_hcr(dev, in_param, out_param, in_modifier,
>  >                                           op_modifier, op, token, event);
>  > +       }
> 
> First and most trivial, you indented with spaces instead of tabs.
> It's always obvious that a patch that looks like
> 
>  	old_code();
> +        new_code();
> 
> has whitespace damage, because things don't line up.

Sorry I just cut and pasted it, just to get a discussion under way. It was in no way
meant to be a final patch. I would attach a real file when it comes to it. Sorry I should
have made that clear.

> 
> Also, why did you do
> 
> 	if (something)
> 		mmiowb()
> 		...
> 	else
> 		mmiowb()
> 		...
> 
> instead of just
> 
> 	mmiowb()
> 	if (something)
> 		...
> 	else
> 		...

Actually looking at the code again, I'm not sure that is the correct
place for the mmiowb() I think maybe it would be better done in the
mthca_cmd_post_hcr() routine before we do any work in there. But,
I need to make sure that mthca_cmd_post_hcr() is always protected
by a lock before we do that. What do you think ?

> 
> Finally, if you're going to add a mmiowb() (and I don't doubt it is
> necessary, I'd just like to understand why exactly) then you need to
> add a big comment explaining what it protects -- mmiowb() is always
> confusing to people so it's important to have an explanation so that
> future changes don't break the driver again.

OK thanks excellent point.

John

-- 
John Partridge

Silicon Graphics Inc
Tel:  651-683-3428
Vnet: 233-3428
E-Mail: johnip at sgi.com




More information about the general mailing list