[ofa-general] Oops with today's OFED 1.3

Eli Cohen eli at dev.mellanox.co.il
Tue Feb 5 12:34:54 PST 2008


Pradeep,
Can you check if this is resolved?

On 2/4/08, Pradeep Satyanarayana <pradeeps at linux.vnet.ibm.com> wrote:
> I pulled today's (Feb 4th) OFED build and saw the following Oops while touch testing
> on ehca1 on a 2.6.24 kernel.
>
> Modules linked in: ib_ipoib ib_cm ib_sa ib_uverbs ib_umad ib_ehca ib_mthca ib_mad ib_core joydev st ide_cd ipv6 sg pdc202xx_new e1000 ibmveth dm_mod ipr libata firmware_class sr_mod cdrom sd_mod scsi_mod
> NIP: d000000000299ca8 LR: d000000000299a70 CTR: d00000000015ec04
> REGS: c0000001cc85f3b0 TRAP: 0300   Not tainted  (2.6.23-ppc64)
> MSR: 8000000000009032 <EE,ME,IR,DR>  CR: 24022424  XER: 00000020
> DAR: 000000000000002c, DSISR: 0000000042000000
> TASK = c0000001d883d4a0[17052] 'modprobe' THREAD: c0000001cc85c000 CPU: 2
> GPR00: 0000000000000000 c0000001cc85f630 d0000000002b5cf0 ffffffffffffffda
> GPR04: c0000001cc85f760 ffffffffffffffda d0000000002a7eb0 0000000000000000
> GPR08: 0000000000000000 0000000000000000 0000000000000001 00000000001b4800
> GPR12: d00000000029ef30 c0000000005a8280 c0000001d895aa20 0000000000000000
> GPR16: 0000000000000008 0000000000000000 0000000000000000 d00000000040f27e
> GPR20: 0000000000000211 0000000000000000 0000000000000000 c0000001cd1e0000
> GPR24: 0000000000000000 d0000000002ad9d8 d0000000002a7eb0 0000000000000001
> GPR28: c0000001cc85f760 0000000000000000 d0000000002b4ce0 c0000001cd1e0780
> NIP [d000000000299ca8] .ipoib_cm_dev_init+0x440/0x63c [ib_ipoib]
> LR [d000000000299a70] .ipoib_cm_dev_init+0x208/0x63c [ib_ipoib]
> Call Trace:
> [c0000001cc85f630] [d000000000299a70] .ipoib_cm_dev_init+0x208/0x63c [ib_ipoib] (unreliable)
> [c0000001cc85f7d0] [d000000000297f4c] .ipoib_transport_dev_init+0x120/0x458 [ib_ipoib]
> [c0000001cc85f930] [d00000000029463c] .ipoib_ib_dev_init+0x44/0xb8 [ib_ipoib]
> [c0000001cc85f9c0] [d0000000002902ec] .ipoib_dev_init+0xe0/0x138 [ib_ipoib]
> [c0000001cc85fa60] [d000000000290544] .ipoib_add_one+0x200/0x424 [ib_ipoib]
> [c0000001cc85fb20] [d0000000001610e4] .ib_register_client+0x94/0xf4 [ib_core]
> [c0000001cc85fbb0] [d00000000029dcac] .ipoib_init_module+0x1f8/0x246c [ib_ipoib]
> [c0000001cc85fc70] [c0000000000905f0] .sys_init_module+0x176c/0x187c
> [c0000001cc85fe30] [c00000000000852c] syscall_exit+0x0/0x40
> Instruction dump:
> 801f0f20 3b600000 2f800000 409d0040 e81f0f30 e97f04f0 7b6926e4 395b0001
> 7d5b07b4 7c080214 816b0018 7d290214 <9169002c> 60000000 60000000 60000000
>
>
> I tracked this down to the following area of code:
> +       for (j = 0; j < ipoib_recvq_size; ++j) {
> +               for (i = 0; i < priv->cm.num_frags; ++i)
> +                       priv->cm.rx_wr_arr[j].rx_sge[i].lkey = priv->mr->lkey;
>
>
> in ipoib_0230_srq_post_n.patch.
>
> Touch tested after removing this patch seems to solve the problem.
>
> Pradeep
>
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>



More information about the general mailing list