[ofa-general] problem with rdma_ucm in OpenSuSE 10.2 default kernel

Joe Landman landman at scalableinformatics.com
Sun Jul 8 08:37:30 PDT 2007


After getting it to build correctly, installing it, and configuring it, 
I am getting a crash in rdma_ucm.  That and for some reason, there is a 
dependency upon ipv6.ko which depmod doesn't pick up.  The latter is 
solvable easily, but the former is troubling.  Here is the snippet from 
the messages file

> Jul  8 11:08:30 jackrabbit kernel: ----------- [cut here ] --------- [please bite here ] ---------
> Jul  8 11:08:30 jackrabbit kernel: Kernel BUG at fs/sysfs/file.c:473
> Jul  8 11:08:30 jackrabbit kernel: invalid opcode: 0000 [1] SMP 
> Jul  8 11:08:30 jackrabbit kernel: last sysfs file: /class/net/ib0/mode
> Jul  8 11:08:30 jackrabbit kernel: CPU 3 
> Jul  8 11:08:30 jackrabbit kernel: Modules linked in: rdma_ucm ib_sdp rdma_cm iw_cm ib_addr ib_local_sa ib_ipoib ipv6 snd_pcm_oss s
> nd_mixer_oss ib_uverbs snd_seq ib_umad snd_seq_device ib_cm ib_sa cpufreq_conservative cpufreq_ondemand cpufreq_userspace cpufreq_p
> owersave powernow_k8 freq_table button battery ac ipmi_si ipmi_devintf ipmi_msghandler apparmor aamatch_pcre ext3 jbd mbcache loop 
> dm_mod usbhid usb_storage snd_hda_intel snd_hda_codec snd_pcm snd_timer ib_mthca snd shpchp ehci_hcd ib_mad ohci_hcd ohci1394 ib_co
> re soundcore pci_hotplug ide_cd i2c_nforce2 ieee1394 forcedeth cdrom snd_page_alloc usbcore i2c_core xfs edd fan sg arcmsr sata_nv 
> libata amd74xx thermal processor sd_mod scsi_mod ide_disk ide_core
> Jul  8 11:08:30 jackrabbit kernel: Pid: 5464, comm: modprobe Tainted: G     U 2.6.18.2-34-default #1
> Jul  8 11:08:30 jackrabbit kernel: RIP: 0010:[<ffffffff802eaeb1>]  [<ffffffff802eaeb1>] sysfs_create_file+0x19/0x31
> Jul  8 11:08:30 jackrabbit kernel: RSP: 0000:ffff81042171de50  EFLAGS: 00010202
> Jul  8 11:08:30 jackrabbit kernel: RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffffffff803eddf8
> Jul  8 11:08:30 jackrabbit kernel: RDX: 0000000000000000 RSI: ffffffff8856d720 RDI: ffff8104274f3810
> Jul  8 11:08:30 jackrabbit kernel: RBP: ffff810423e8c000 R08: ffffffff804d83b8 R09: ffff810424bb7b80
> Jul  8 11:08:30 jackrabbit kernel: R10: 0000000000000022 R11: ffff810424bb7b80 R12: ffff810423e8c5c0
> Jul  8 11:08:30 jackrabbit kernel: R13: ffffffff8856d900 R14: ffff810423e8c558 R15: ffffc20000a87e48
> Jul  8 11:08:30 jackrabbit kernel: FS:  00002b5c9772f6f0(0000) GS:ffff810428f7a9c0(0000) knlGS:0000000000000000
> Jul  8 11:08:30 jackrabbit kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> Jul  8 11:08:30 jackrabbit kernel: CR2: 000000000062f007 CR3: 0000000226d4a000 CR4: 00000000000006e0
> Jul  8 11:08:30 jackrabbit kernel: Process modprobe (pid: 5464, threadinfo ffff81042171c000, task ffff8104288e3830)
> Jul  8 11:08:30 jackrabbit kernel: Stack:  ffffffff881a1026 ffffffff8856d900 ffffffff80299bcc 0000000000000019
> Jul  8 11:08:30 jackrabbit kernel:  0000000000000000 000000002171de78 0000000000000000 0000000000000000
> Jul  8 11:08:30 jackrabbit kernel:  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> Jul  8 11:08:30 jackrabbit kernel: Call Trace:
> Jul  8 11:08:30 jackrabbit kernel:  [<ffffffff881a1026>] :rdma_ucm:ucma_init+0x26/0x4a
> Jul  8 11:08:30 jackrabbit kernel:  [<ffffffff80299bcc>] sys_init_module+0x172f/0x18e5
> Jul  8 11:08:30 jackrabbit kernel:  [<ffffffff8025800e>] system_call+0x7e/0x83
> Jul  8 11:08:30 jackrabbit kernel: 
> Jul  8 11:08:30 jackrabbit kernel: 
> Jul  8 11:08:30 jackrabbit kernel: Code: 0f 0b 68 b8 75 40 80 c2 d9 01 48 8b 7f 48 ba 04 00 00 00 e9 
> Jul  8 11:08:30 jackrabbit kernel: RIP  [<ffffffff802eaeb1>] sysfs_create_file+0x19/0x31
> Jul  8 11:08:30 jackrabbit kernel:  RSP <ffff81042171de50>
> Jul  8 11:08:30 jackrabbit kernel:  <6>ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
> Jul  8 11:08:36 jackrabbit kernel: eth0: no IPv6 routers present
> Jul  8 11:08:40 jackrabbit kernel: ib0: no IPv6 routers present

I bring ipoib for testing (pinging) hosts, as well as having some of the 
ssh traffic cross it.  Sometimes quite useful.

Is the above a known problem?  Should I file a bug report?  The tainted 
kernel is likely due to the arcmsr driver, though it is open source, so 
I am not sure what is "tainted" about it.

Joe

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615



More information about the general mailing list