[ofa-general] Re: [RFC][PATCH] last recv race patch

Roland Dreier rdreier at cisco.com
Tue Jun 24 18:42:00 PDT 2008


This all looks like good work, especially if it's fixing crasher
problems.  However, it would really make it a lot easier to review if
the changelog had more detail:

 >       I turned on debug, I found that same QP context being destoryed twice
 > for nonSRQ connection. I reviewed the code and found that there is a window
 > the list could be added after the reap call, so checking the QP context
 > status is needed.

 > Address a possible race

in other words if your could diagram the race in the standard way, eg

    Thread 1                                        Thread 2

    foo();
    list_add();
                                                  do_something();
                                                  blah();
    check_it();
    oops_because_thread_2_messed_us_up();

then it avoids me having to reverse engineer the debugging work you did
and makes it much easier to see why this patch makes sense.

Basically the patch description should explain the problem with the
current code in enough detail that it is easy to understand how and when
things go wrong, and explain the fix enough so that it is easy to
understand why you are changing things as you do.

I'm guessing in this patch it's a race with the stale task moving the
connection to the error state exactly when the last receive completes?

Anyway this applies to all the patches, thanks.

 - R.



More information about the general mailing list