[ofa-general] Re: [PATCH 4/4] [RFC] IPoIB/cm: Add connected mode support for devices without SRQs
Pradeep Satyanarayana
pradeeps at linux.vnet.ibm.com
Wed Oct 31 12:49:09 PDT 2007
Pradeep Satyanarayana wrote:
> Roland Dreier wrote:
>> > Crud, I see a bug with that commit and non-SRQ:
>> > ipoib_cm_handle_tx_wc() does
>> >
>> > struct ipoib_cm_tx *tx = wc->qp->qp_context;
>> >
>> > and there's no reason for wc->qp to be set if the HCA does not handle
>> > SRQs. In fact there's no reason for wc->qp to be set for send
>> > completions in general.
>>
>> Actually, I take that back. Every driver seems to set wc->qp in all
>> cases, so I guess it is safe to rely on that now. (Which actually
>> means that the table of RX QPs in the non-SRQ patch can be dropped so
>> we make things dramatically simpler).
>
> Yes, the rx_table was introduced when ehca did not set wc->qp. I know
> that Joachim Fenkes submitted a fix for that. I will confirm if
> that fix is already in this tree.
I did confirm that fix is in there. So, that appears not to be the issue
here.
>
>> But that means I really have no idea what your bug is. Could you say
>> how you're running netperf so I can try to reproduce the crash?
>
I think I have a clue as to what this could be. I suspect this problem is
not related to IB at all. While experimenting with various things, my
make crashed the system indicating a bug in cache_alloc_refill()
called by __kmalloc(). The stack trace had ext3 routines in it.
I am guessing this may be a manifestation of the assorted things in the
git tree that I pulled. However, the 2.6.23 (and 2.6.23.1) tar balls from
kernel.org do not have the napi stuff in it. Should I go ahead and patch the
napi stuff to the 2.6.23.1 tree and try again?
Pradeep
More information about the general
mailing list