[ofa-general] libipath crash

Robert Pearson rpearson at systemfabricworks.com
Sat Nov 24 16:38:41 PST 2007


Roland,

[root at client1 src]# uname -a
Linux client1 2.6.18-8.1.15.el5 #1 SMP Mon Oct 22 08:32:28 EDT 2007 x86_64
x86_64 x86_64 GNU/Linux

[root at client1 src]# /etc/infiniband/info
prefix=/usr
Kernel=2.6.18-8.1.15.el5

Configure options: --with-cxgb3-mod --with-ipath_inf-mod --with-ipoib-mod
--with-iser-mod --with-mthca-mod --with-sdp-mo
d --with-srp-mod --with-core-mod --with-user_mad-mod --with-user_access-mod
--with-addr_trans-mod --with-mlx4-mod --with
-rds-mod

[root at client1 src]# ibstat
CA 'ipath0'
        CA type: InfiniPath_QLE7140
        Number of ports: 1
        Firmware version:
        Hardware version: 1
        Node GUID: 0x00117500006870d1
        System image GUID: 0x00117500006870d1
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 10
                Base lid: 12
                LMC: 0
                SM lid: 12
                Capability mask: 0x02010802
                Port GUID: 0x00117500006870d1

[root at client1 src]# grep INF /boot/config-2.6.18-8.1.15.el5
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_MISC=y
CONFIG_IPV6_ROUTE_INFO=y
# CONFIG_INFTL is not set
CONFIG_INFINIBAND=m
CONFIG_INFINIBAND_USER_MAD=m
CONFIG_INFINIBAND_USER_ACCESS=m
CONFIG_INFINIBAND_ADDR_TRANS=y
CONFIG_INFINIBAND_MTHCA=m
CONFIG_INFINIBAND_MTHCA_DEBUG=y
CONFIG_INFINIBAND_IPATH=m
CONFIG_INFINIBAND_IPOIB=m
CONFIG_INFINIBAND_IPOIB_DEBUG=y
# CONFIG_INFINIBAND_IPOIB_DEBUG_DATA is not set
CONFIG_INFINIBAND_SRP=m
CONFIG_INFINIBAND_ISER=m
CONFIG_INFINIBAND_SDP=m
# CONFIG_INFINIBAND_SDP_DEBUG is not set
CONFIG_INFINIBAND_MADEYE=m
CONFIG_DEBUG_INFO=y
CONFIG_ZLIB_INFLATE=y

last interesting message

Nov 24 12:56:35 client1 kernel: ----------- [cut here ] --------- [please
bite here ] ---------
Nov 24 12:56:35 client1 kernel: Kernel BUG at mm/slab.c:2649
Nov 24 12:56:35 client1 kernel: invalid opcode: 0000 [1] SMP
Nov 24 12:56:35 client1 kernel: last sysfs file:
/class/infiniband/ipath0/ports/1/pkeys/3
Nov 24 12:56:35 client1 kernel: CPU 1
Nov 24 12:56:35 client1 kernel: Modules linked in: autofs4 hidp rfcomm l2cap
bluetooth sunrpc rdma_ucm(U) ib_srp(U) ib_s
dp(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_uverbs(U) ib_umad(U) ib_mthca(U)
ib_ipoib(U) ib_cm(U) ib_sa(U) ib_mad(U) ip_conn
track_netbios_ns ipt_REJECT xt_state ip_conntrack nfnetlink iptable_filter
ip_tables ip6t_REJECT xt_tcpudp ip6table_filt
er ip6_tables x_tables ipv6 dm_mirror dm_mod video sbs i2c_ec i2c_core
button battery asus_acpi acpi_memhotplug ac parpo
rt_pc lp parport sg ib_ipath(U) ib_core(U) ide_cd shpchp cdrom bnx2
serio_raw pcspkr mptsas mptscsih mptbase scsi_transp
ort_sas sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd
Nov 24 12:56:35 client1 kernel: Pid: 5163, comm: test Not tainted
2.6.18-8.1.15.el5 #1
Nov 24 12:56:35 client1 kernel: RIP: 0010:[<ffffffff80016ebb>]
[<ffffffff80016ebb>] cache_grow+0x1e/0x395
Nov 24 12:56:35 client1 kernel: RSP: 0018:ffff81001f37dc08  EFLAGS: 00010006
Nov 24 12:56:35 client1 kernel: RAX: 0000000000000000 RBX: 00000000000080d0
RCX: 00000000ffffffff
Nov 24 12:56:35 client1 kernel: RDX: 0000000000000000 RSI: 00000000000080d0
RDI: ffff810037e70040
Nov 24 12:56:35 client1 kernel: RBP: ffff81003ffa0d60 R08: ffff810037e26c00
R09: ffff810002013000
Nov 24 12:56:35 client1 kernel: R10: 0000000000000000 R11: 000000d000000001
R12: ffff810037e70040
Nov 24 12:56:35 client1 kernel: R13: ffff81003ffa0d40 R14: 0000000000000000
R15: ffff810037e70040
Nov 24 12:56:35 client1 kernel: FS:  00002aaaaaad7460(0000)
GS:ffff810037e117c0(0000) knlGS:0000000000000000
Nov 24 12:56:35 client1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
000000008005003b
Nov 24 12:56:35 client1 kernel: CR2: 0000000000404000 CR3: 000000002ef7a000
CR4: 00000000000006e0
Nov 24 12:56:35 client1 kernel: Process test (pid: 5163, threadinfo
ffff81001f37c000, task ffff81003f18b0c0)
Nov 24 12:56:35 client1 kernel: Stack:  ffff81003ffa04c0 ffffffff8000ee4e
00000010000200d0 ffff8100016bc7a0
Nov 24 12:56:35 client1 kernel:  ffff81000000ec10 00000000ffffffff
ffff81003ffa0d60 ffff810037e26c00
Nov 24 12:56:35 client1 kernel:  ffff81003ffa0d40 000000000000003c
ffff810037e70040 ffffffff8005a5ce
Nov 24 12:56:35 client1 kernel: Call Trace:
Nov 24 12:56:35 client1 kernel:  [<ffffffff8000ee4e>]
__alloc_pages+0x65/0x2b2
Nov 24 12:56:35 client1 kernel:  [<ffffffff8005a5ce>]
cache_alloc_refill+0x136/0x186
Nov 24 12:56:35 client1 kernel:  [<ffffffff800cc5dc>]
kmem_cache_alloc_node+0x98/0xb2
Nov 24 12:56:35 client1 kernel:  [<ffffffff800c2ae8>]
__vmalloc_area_node+0x62/0x153
Nov 24 12:56:35 client1 kernel:  [<ffffffff800c2e36>] vmalloc_user+0x15/0x50
Nov 24 12:56:35 client1 kernel:  [<ffffffff8819ae1f>]
:ib_ipath:ipath_create_qp+0x171/0x5e2
Nov 24 12:56:35 client1 kernel:  [<ffffffff80117410>] avc_has_perm+0x43/0x55
Nov 24 12:56:35 client1 kernel:  [<ffffffff88424013>]
:ib_uverbs:__idr_get_uobj+0x33/0x45
Nov 24 12:56:35 client1 kernel:  [<ffffffff88426a34>]
:ib_uverbs:ib_uverbs_create_qp+0x251/0x467
Nov 24 12:56:35 client1 kernel:  [<ffffffff88423732>]
:ib_uverbs:ib_uverbs_qp_event_handler+0x0/0x21
Nov 24 12:56:35 client1 kernel:  [<ffffffff884231ce>]
:ib_uverbs:ib_uverbs_write+0x93/0xa9
Nov 24 12:56:35 client1 kernel:  [<ffffffff8011a55d>]
selinux_file_permission+0x9f/0xb6
Nov 24 12:56:35 client1 kernel:  [<ffffffff80016122>] vfs_write+0xce/0x174
Nov 24 12:56:35 client1 kernel:  [<ffffffff800169b3>] sys_write+0x45/0x6e
Nov 24 12:56:35 client1 kernel:  [<ffffffff8005b349>] tracesys+0xd1/0xdc
Nov 24 12:56:35 client1 kernel:
Nov 24 12:56:35 client1 kernel:
Nov 24 12:56:35 client1 kernel: Code: 0f 0b 68 4a 38 28 80 c2 59 0a f6 c7 20
0f 85 53 03 00 00 89
Nov 24 12:56:35 client1 kernel: RIP  [<ffffffff80016ebb>]
cache_grow+0x1e/0x395
Nov 24 12:56:35 client1 kernel:  RSP <ffff81001f37dc08>
Nov 24 12:56:35 client1 kernel:  <0>Kernel panic - not syncing: Fatal
exception


restarted after this

Thanks,

Bob

-----Original Message-----
From: Roland Dreier [mailto:rdreier at cisco.com] 
Sent: Saturday, November 24, 2007 3:43 PM
To: Robert Pearson
Cc: openib-general at openib.org; Arthur Jones
Subject: Re: [ofa-general] libipath crash

 > The following test case will cause system crash on 1.2.5.1 if run a few
 > times with an ipath device but not with an mthca device. I noticed some
 > recent fixes to ibv_resize_cq but did not see anything about
ibv_create_cq.

I can't reproduce here with Linus's latest git tree
(v2.6.24-rc3-19-g2ffbb83).
What are the kernel messages from the crash?  What kernel are you running
on?

 - R.



More information about the general mailing list