[ofa-general] Re: [RFC][PATCH] last wqe event handler patch

Wed Jun 25 15:09:12 PDT 2008

Roland Dreier <rdreier at cisco.com> wrote on 06/25/2008 02:43:19 PM:

> Interesting... I wonder if it really is taking that long for everything
> to finish draining, or if the system is too busy so it sees a spurious
> timeout?  The intention of all of this is that it should "never happen"
> unless the hardware really is stuck.

I guess the reason might be we have a large cluster, each node has 4 ports,
too many RC QPs in this set up. We saw QPs went to dead and 5 secs drain
didn't work.

> What exactly is causing the crash here?

You can ignore this for now, it's related to other patch not current code
level. I will explain it in drain WR post_send failure patch.

Please review the stale connection resource cleanup patch to see whether it
makes sense.

thanks
Shirley
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080625/79f3565f/attachment.html>