[ewg] Re: [ofa-general] Oops with today's OFED 1.3

Pradeep Satyanarayana pradeeps at linux.vnet.ibm.com
Tue Feb 5 12:47:15 PST 2008


Eli Cohen wrote:
> Pradeep,
> Can you check if this is resolved?
> 
> On 2/4/08, Pradeep Satyanarayana <pradeeps at linux.vnet.ibm.com> wrote:
>> I pulled today's (Feb 4th) OFED build and saw the following Oops while touch testing
>> on ehca1 on a 2.6.24 kernel.
>>

<snip>


>> NIP [d000000000299ca8] .ipoib_cm_dev_init+0x440/0x63c [ib_ipoib]
>> LR [d000000000299a70] .ipoib_cm_dev_init+0x208/0x63c [ib_ipoib]
>> Call Trace:
>> [c0000001cc85f630] [d000000000299a70] .ipoib_cm_dev_init+0x208/0x63c [ib_ipoib] (unreliable)
>> [c0000001cc85f7d0] [d000000000297f4c] .ipoib_transport_dev_init+0x120/0x458 [ib_ipoib]
>> [c0000001cc85f930] [d00000000029463c] .ipoib_ib_dev_init+0x44/0xb8 [ib_ipoib]
>> [c0000001cc85f9c0] [d0000000002902ec] .ipoib_dev_init+0xe0/0x138 [ib_ipoib]
>> [c0000001cc85fa60] [d000000000290544] .ipoib_add_one+0x200/0x424 [ib_ipoib]
>> [c0000001cc85fb20] [d0000000001610e4] .ib_register_client+0x94/0xf4 [ib_core]
>> [c0000001cc85fbb0] [d00000000029dcac] .ipoib_init_module+0x1f8/0x246c [ib_ipoib]
>> [c0000001cc85fc70] [c0000000000905f0] .sys_init_module+0x176c/0x187c
>> [c0000001cc85fe30] [c00000000000852c] syscall_exit+0x0/0x40
>> Instruction dump:
>> 801f0f20 3b600000 2f800000 409d0040 e81f0f30 e97f04f0 7b6926e4 395b0001
>> 7d5b07b4 7c080214 816b0018 7d290214 <9169002c> 60000000 60000000 60000000

Hello Eli,

Yes, this particular issue has been solved. However, I do see some other issues.

I seeing some new messages (not seen previously) in dmesg relating to 
ib_cq_destroy() (on ehca):

ib0: ib_cq_destroy failed
ib_destroy_srq failed: -16
ib_dealloc_pd failed

This happens after some network tests and an rmmod of ib_ehca.

At this point my guess is that this has to do with the split CQ patch. I have not 
had enough cycles to state that with absolute certainty. Can you please take a look too?

Pradeep




More information about the ewg mailing list