[ofa-general] Re: ipoib attempting to join on junk MGID / 2.6.21-rc6 crash dump

Hal Rosenstock halr at voltaire.com
Thu Jul 12 04:07:34 PDT 2007


On Thu, 2007-07-12 at 06:56, Or Gerlitz wrote:
> OK, I did some checks with upstream kernel, the junk mkey for child interface phenomena
> does not reproduce, which probably means its either ofed or RH4 kernel issue.

FWIW (probably just as a data point to keep in mind), this problem has
been seen and reported on the list quite a while ago. It is extremely
hard to reproduce. No clue as to what causes it.

-- Hal

> However, I started on 2.6.21-rc6 under which i saw the below crash, which
> does not reproduce now under 2.6.22, was there any fix that you are aware
> to around this area of the code?
> 
> Or.
> 
> 
> ib0.8007: bringing up interface
> ib0.8007: IPOIB_FLAG_OPER_UP not set<7>ib0.8007: IPOIB_FLAG_OPER_UP not set<6>ADDRCONF(NETDEV_UP): ib0.8007: link is not ready
> ib0.8007: IPOIB_FLAG_OPER_UP not set<7>ib0.8007: IPOIB_FLAG_OPER_UP not set<7>ib0.8007: stopping interface
> ib0.8007: downing ib_dev
> ib0.8007: stopping multicast thread
> ib0.8007: flushing multicast list
> Unable to handle kernel NULL pointer dereference at 0000000000000070 RIP:
>  [<ffffffff8045d84e>] _spin_lock_irqsave+0x3/0x24
> PGD 36305067 PUD 3b8fb067 PMD 0
> Oops: 0002 [1] SMP
> CPU 1
> Modules linked in: ib_ipoib ib_cm ib_sa ipv6 ib_mthca ib_mad ib_core sg st sd_mod sr_mod scsi_mod e100 i2c_amd8111 i2c_amd756 i2c_core
> Pid: 12633, comm: ifconfig Not tainted 2.6.21-rc6 #2
> RIP: 0010:[<ffffffff8045d84e>]  [<ffffffff8045d84e>] _spin_lock_irqsave+0x3/0x24
> RSP: 0018:ffff810026dcbc50  EFLAGS: 00010092
> RAX: 0000000000000292 RBX: ffff810016425000 RCX: ffff810016425750
> RDX: ffff810026dcbd48 RSI: 0000000000000000 RDI: 0000000000000070
> RBP: 0000000000000000 R08: 00000000ffffffff R09: 0000000000000000
> R10: ffff810000e6b2c0 R11: 0000000000000001 R12: 0000000000000070
> R13: 0000000000000000 R14: ffff81003f8c6c00 R15: ffff810016425000
> FS:  00002abdfeb7b740(0000) GS:ffff81003f8a7a40(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000000000070 CR3: 000000003897a000 CR4: 00000000000006e0
> Process ifconfig (pid: 12633, threadinfo ffff810026dca000, task ffff81003d7537f0)
> Stack:  ffffffff880e48f7 0000003000000010 ffff810026dcbd38 ffff810026dcbc78
>  ffff81003f8c6c00 ffff810016425000 ffff810016425000 ffff810026dcbd30
>  ffff810016425000 ffff810016425700 ffff810016425700 ffff810016425000
> Call Trace:
>  [<ffffffff880e48f7>] :ib_cm:cm_destroy_id+0x1c/0x25c
>  [<ffffffff880f28a3>] :ib_ipoib:ipoib_cm_dev_stop+0x27/0xc5
>  [<ffffffff880ed53c>] :ib_ipoib:ipoib_ib_dev_stop+0x25/0x2c3
>  [<ffffffff8023f504>] flush_cpu_workqueue+0xb3/0xc1
>  [<ffffffff802426cc>] autoremove_wake_function+0x0/0x2e
>  [<ffffffff80238f8e>] lock_timer_base+0x1b/0x3c
>  [<ffffffff880eea8c>] :ib_ipoib:ipoib_mcast_dev_flush+0x10e/0x159
>  [<ffffffff880ece93>] :ib_ipoib:ipoib_flush_paths+0x34/0x15a
>  [<ffffffff880eb927>] :ib_ipoib:ipoib_stop+0x63/0xef
>  [<ffffffff8040d237>] dev_close+0x58/0x77
>  [<ffffffff8040c2f6>] dev_change_flags+0x57/0x119
>  [<ffffffff80443f24>] devinet_ioctl+0x265/0x5cd
>  [<ffffffff80444f8e>] inet_ioctl+0x3f/0x5e
>  [<ffffffff80402cae>] sock_ioctl+0x16c/0x189
>  [<ffffffff80284ecd>] do_ioctl+0x29/0x6f
>  [<ffffffff80285187>] vfs_ioctl+0x274/0x285
>  [<ffffffff802851d4>] sys_ioctl+0x3c/0x60
>  [<ffffffff802093ce>] system_call+0x7e/0x83
> 
> 
> Code: f0 ff 0f 79 1b a9 00 02 00 00 74 0b fb f3 90 83 3f 00 7e f9
> RIP  [<ffffffff8045d84e>] _spin_lock_irqsave+0x3/0x24
>  RSP <ffff810026dcbc50>
> CR2: 0000000000000070
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general




More information about the general mailing list