[PATCH] Re: [openib-general] Re: IPoIB Failure CQ overrun
Michael S. Tsirkin
mst at mellanox.co.il
Mon Dec 20 08:01:46 PST 2004
Quoting r. Roland Dreier (roland at topspin.com) "Re: [PATCH] Re: [openib-general] Re: IPoIB Failure CQ overrun":
> Michael> In investigating this issue I discovered what I belive is
> Michael> a race condition in mthca:
> Thanks, good catch. I'll apply your patch. In the future can you add
> a Signed-off-by: line to your patches?
Sorry,I forgot it. Here it is for the last patch:
Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>
> Michael> I also would like to suggest implementing CQ doorbell
> Michael> coalescing in mthca, to reduce the number of CQ
> Michael> doorbells.
> Sounds like a good idea...
> Michael> Unfortunately this patch does not seem to solve the
> Michael> overrun problem, so may be another problem. That will
> Michael> need more looking into.
> OK. At this point do you think it's a FW problem or a driver problem?
CQ consumer index doorbell FW is reasonably well tested with VAPI (and with
directed tests). It is also relatively straight-forward code so I would
suspect a driver problem first of all.
Unfortunately once the overrun happends I can not bring the interface
down nor unload the ip over ib module (both commands hang) so I have to
reboot. This is slowing me down considerably.
Do you have an idea why is that, and how to fix this problem?
More information about the general