[ofa-general] Re: [PATCH 4/4] [RFC] IPoIB/cm: Add connected mode support for devices without SRQs

Pradeep Satyanarayana pradeeps at linux.vnet.ibm.com
Wed Oct 31 12:49:09 PDT 2007


Pradeep Satyanarayana wrote:
> Roland Dreier wrote:
>>  > Crud, I see a bug with that commit and non-SRQ:
>>  > ipoib_cm_handle_tx_wc() does
>>  > 
>>  >        struct ipoib_cm_tx *tx = wc->qp->qp_context;
>>  > 
>>  > and there's no reason for wc->qp to be set if the HCA does not handle
>>  > SRQs.  In fact there's no reason for wc->qp to be set for send
>>  > completions in general.
>>
>> Actually, I take that back.  Every driver seems to set wc->qp in all
>> cases, so I guess it is safe to rely on that now.  (Which actually
>> means that the table of RX QPs in the non-SRQ patch can be dropped so
>> we make things dramatically simpler).
> 
> Yes, the rx_table was introduced when ehca did not set wc->qp. I know
> that Joachim Fenkes submitted a fix for that. I will confirm if
> that fix is already in this tree.

I did confirm that fix is in there. So, that appears not to be the issue 
here.

> 
>> But that means I really have no idea what your bug is.  Could you say
>> how you're running netperf so I can try to reproduce the crash?
> 
I think I have a clue as to what this could be. I suspect this problem is 
not related to IB at all. While experimenting with various things, my 
make crashed the system indicating a bug in cache_alloc_refill() 
called by __kmalloc(). The stack trace had ext3 routines in it.

I am guessing this may be a manifestation of the assorted things in the
git tree that I pulled. However, the 2.6.23 (and 2.6.23.1) tar balls from
kernel.org do not have the napi stuff in it. Should I go ahead and patch the
napi stuff to the 2.6.23.1 tree and try again?

Pradeep




More information about the general mailing list