[openib-general] Re: OpenSM unable to bring up subnet
Hal Rosenstock
halr at voltaire.com
Mon Nov 7 21:35:05 PST 2005
On Mon, 2005-11-07 at 22:51, Sayantan Sur wrote:
> Hi,
>
> Thanks for your reply!
>
> >Is the infiniband support from 2.6.13.1 or has it been replaced with
> >OpenIB svn of the revs indicated (or is that only OpenSM) ? If it is
> >only OpenSM, I would recommend trying to update at least user_mad.c as
> >there have been a number of problems which have been fixed in this.
> >There will be some backport issues to 2.6.13.1 to deal with but they
> >have all been discussed on the list.
> >
> >
> Yes, the IB support is from 2.6.13.1 (kernel drivers at rev 3882).
Can you update to the latest ? I think there may have been some problems
there.
> I
> have updated the userland stuff. user_mad.c is currently at the latest
> revision.
> Do I really need to update my kernel to 2.6.14 and get the
> latest drivers?
I'm not sure. Mine was working with 2.6.13 and then I upgraded to
2.6.14. I saw a lot of problems but this may have been based on OpenIB
svn versions during the time frame of various mad and user_mad changes
which started I think with r3867 so you are definitely in that area.
> >
> >Was opensm started with -V ?
> >
> >
> No, here is what I get with -V:
>
> [surs at ro0:tmp] sudo opensm -V
> Password:
> OpenSM Rev:openib-1.1.0
>
> Using default guid 0x2c902004002e9
>
> Error from osm_opensm_bind (0x2A)
> Exiting SM
>
> >Since gets are timing out, there is no response to SubnGet NodeInfo for
> >the local node which sets the SM port GUID.
> >
> >Anyrhing relevant in dmesg ?
> >
> >
> Whoa! I found this:
>
> ===>
> Modules linked in: ib_ucm ib_cm ib_uverbs ib_umad ib_mthca ib_mad
> ib_core usbserial usbcore freq_table thermal processor fan button
> snd_pcm_oss battery ac snd_mixer_oss ipv6 evdev floppy joydev
> snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd st soundcore sr_mod
> snd_page_alloc edd sg parport_pc lp parport video1394 ohci1394 raw1394
> ieee1394 capability commoncap dm_mod reiserfs ide_cd cdrom ide_disk
> sata_nv libata amd74xx ide_core sd_mod scsi_mod
> Pid: 7025, comm: ib_mad1 Not tainted 2.6.13.1-smp
> RIP: 0010:[<ffffffff80163701>] <ffffffff80163701>{kfree+193}
> RSP: 0018:ffff81003b52fdb8 EFLAGS: 00010086
> RAX: 0000000000000000 RBX: 28ffff81000124c0 RCX: ffff81000000d000
> RDX: 000000000004d000 RSI: ffff81003c63db80 RDI: ffff81000125a029
> RBP: ffff81000b000000 R08: ffff81003b52e000 R09: 0000000000000000
> R10: 00000000ffffffff R11: 0000000000000000 R12: ffff81007f963a10
> R13: ffff810079558000 R14: ffff81007f963a78 R15: ffffffff882c1ad0
> FS: 0000000040803960(0000) GS:ffffffff804ee800(0000) knlGS:0000000000000000
> CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 0000000040600ed8 CR3: 000000007c6ef000 CR4: 00000000000006e0
> Process ib_mad1 (pid: 7025, threadinfo ffff81003b52e000, task
> ffff810001fe5510)
> Stack: 0000000000000286 ffff81007f963a10 ffff810079c80380 ffffffff882bf52e
> ffff81003b52fe28 ffffffff882ea93f ffff810001fe5728 ffff81003c7b2d00
> ffff81007f963a00 0000000000000292
> Call Trace:<ffffffff882bf52e>{:ib_mad:ib_free_send_mad+14}
> <ffffffff882ea93f>{:ib_umad:send_handler+63}
> <ffffffff882c1c4b>{:ib_mad:timeout_sends+379}
> <ffffffff80131283>{__wake_up+67}
> <ffffffff80146c7e>{worker_thread+478}
> <ffffffff80130760>{default_wake_function+0}
> <ffffffff8012e023>{__wake_up_common+67}
> <ffffffff80130760>{default_wake_function+0}
> <ffffffff8014b270>{keventd_create_kthread+0}
> <ffffffff80146aa0>{worker_thread+0}
> <ffffffff8014b270>{keventd_create_kthread+0}
> <ffffffff8014b3c9>{kthread+217}
> <ffffffff8010e962>{child_rip+8}
> <ffffffff8014b270>{keventd_create_kthread+0}
> <ffffffff8014b2f0>{kthread+0} <ffffffff8010e95a>{child_rip+0}
>
>
> Code: 8b 03 3b 43 04 73 04 89 c0 eb 0a 48 89 de e8 4c fe ff ff 8b
> RIP <ffffffff80163701>{kfree+193} RSP <ffff81003b52fdb8>
> <====
That explains why there were no responses. The kernel stuff is not
working right. Please update.
-- Hal
More information about the general
mailing list