[openib-general] ipoib oops

Michael S. Tsirkin mst at mellanox.co.il
Wed Aug 24 02:24:59 PDT 2005


Hi, Roland!
I have seen the following oops recently, typically after
restarting opensm on the same machine. This is on ipoib rev 3113
Pls note I'm running with my two event patches.

The oops seems to be around offset db7 below:

drivers/infiniband/ulp/ipoib/ipoib_multicast.c:223
     da4:       49 8b 7d 08             mov    0x8(%r13),%rdi
     da8:       48 81 c7 b4 00 00 00    add    $0xb4,%rdi
     daf:       f3 a6                   repz cmpsb %es:(%rdi),%ds:(%rsi)
     db1:       75 17                   jne    dca <ipoib_mcast_join_finish+0x8a
>
drivers/infiniband/ulp/ipoib/ipoib_multicast.c:225
     db3:       49 8b 45 70             mov    0x70(%r13),%rax
include/linux/byteorder/swab.h:147
     db7:       8b 40 20                mov    0x20(%rax),%eax
include/asm/byteorder.h:17
     dba:       0f c8                   bswap  %eax
include/linux/byteorder/swab.h:147
     dbc:       41 89 85 f0 02 00 00    mov    %eax,0x2f0(%r13)
drivers/infiniband/ulp/ipoib/ipoib_multicast.c:226
     dc3:       41 89 85 84 03 00 00    mov    %eax,0x384(%r13)
include/asm/bitops.h:236
     dca:       8b 85 b8 00 00 00       mov    0xb8(%rbp),%eax


Line 255 is here:

                struct ib_ah_attr av = {
                        .dlid          = be16_to_cpu(mcast->mcmember.mlid),
                        .port_num      = priv->port,
                        .sl            = mcast->mcmember.sl,
255:                    .ah_flags      = IB_AH_GRH,
                        .grh           = {
                                .flow_label    = be32_to_cpu(mcast->mcmember.flow_label),
                                .hop_limit     = mcast->mcmember.hop_limit,
                                .sgid_index    = 0,
                                .traffic_class = mcast->mcmember.traffic_class
                        }
                };


so apparently the problem is in accessing mcast->mcmember.mlid.

And I wander: what prevents the mcast object from being destroyed while
a completion is outstanding?

MST

Unable to handle kernel NULL pointer dereference at 0000000000000020 RIP:
<ffffffff88039247>{:ib_ipoib:ipoib_mcast_join_finish+119}
PGD 17e386067 PUD 17f228067 PMD 0
Oops: 0000 [1] SMP
CPU 0
Modules linked in: ib_sdp ib_cm ib_ipoib ib_sa ib_umad ib_mthca ib_mad ib_core
Pid: 17774, comm: ib_mad1 Not tainted 2.6.12.2
RIP: 0010:[<ffffffff88039247>] <ffffffff88039247>{:ib_ipoib:ipoib_mcast_join_finish+119}
RSP: 0018:ffff810169763c58  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8101688a3000 RCX: 0000000000000000
RDX: ffff810178f18a80 RSI: ffff810178f18a90 RDI: ffff8101688a30c4
RBP: ffff810178f18a80 R08: 0000000000000000 R09: ffff810169763d38
R10: ffff810169763df8 R11: 0000000000000001 R12: 0000000000000000
R13: ffff8101688a3380 R14: ffff8101688a3000 R15: ffff810178be1898
FS:  0000000000000000(0000) GS:ffffffff80579f00(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000020 CR3: 000000017e3d0000 CR4: 00000000000006e0
Process ib_mad1 (pid: 17774, threadinfo ffff810169762000, task ffff81016d34d110)
Stack: 0000000000000001 0000000000000006 ffffc20000022000 0000000000000296
       0000000000000296 ffffffff802800a0 ffffffff803c0c9d ffff81017874d540
       ffff810178f18d80 ffff810167de7210
Call Trace:<ffffffff802800a0>{dma_pool_free+272} <ffffffff803c0c9d>{_spin_unlock_irqrestore+5}
       <ffffffff88039cbf>{:ib_ipoib:ipoib_mcast_join_complete+43}
       <ffffffff880001b5>{:ib_core:ib_unpack+198} <ffffffff88030d4c>{:ib_sa:ib_sa_mcmember_rec_callback+64}
       <ffffffff88030505>{:ib_sa:recv_handler+117} <ffffffff8800ce93>{:ib_mad:ib_mad_completion_handler+941}
       <ffffffff8800cae6>{:ib_mad:ib_mad_completion_handler+0}
       <ffffffff801412df>{worker_thread+476} <ffffffff8012d7c5>{default_wake_function+0}
       <ffffffff8012bb71>{__wake_up_common+64} <ffffffff8012d7c5>{default_wake_function+0}
       <ffffffff80145292>{keventd_create_kthread+0} <ffffffff80141103>{worker_thread+0}
       <ffffffff80145292>{keventd_create_kthread+0} <ffffffff801453c3>{kthread+204}
       <ffffffff8010e14f>{child_rip+8} <ffffffff80145292>{keventd_create_kthread+0}
       <ffffffff801452f7>{kthread+0} <ffffffff8010e147>{child_rip+0}


Code: 8b 40 20 0f c8 41 89 85 f0 02 00 00 41 89 85 84 03 00 00 8b
RIP <ffffffff88039247>{:ib_ipoib:ipoib_mcast_join_finish+119} RSP <ffff810169763c58>
CR2: 0000000000000020



-- 
MST



More information about the general mailing list