[openib-general] Problem with OFED on XT3

Makia Minich minich at ornl.gov
Tue Jul 18 07:58:26 PDT 2006


First, a little bit about what I'm trying to do (hoping that someone becomes
interested enough to keep reading), and then the problem.  I'm currently
tasked with getting some form of infiniband up and running on a service node
of the Cray XT3.  Because the XT3 is currently shipping with SuSE9 (with the
2.6.5 based kernel) I decided to go with the OFED 1.0.1 release to see out
of the box what is going to happen.  Because of the system layout, I'm
unable to change out the kernel, so there were some minor OFED source tweaks
that I needed to perform (attached) to satisfy some missing symbols.

On loading modules, I was seemingly successful loading everything up to and
including ib_ipoib.  Ifconfig showed the ib0 and ib1 devices available, and
/sys/class/infiniband showed link to the subnet manager was in place.
Attempting to assign an ip-address to the interface proved to be too much,
as the node kernel panicked with the following:

general protection fault: 0000 [1]
CPU 0 
Pid: 11258, comm: ifconfig Tainted: P   U   (2.6.5-7.252-ss )
RIP: 0010:[<ffffffff8029a85d>]
<ffffffff8029a85d>{__kfree_skb+173}
RSP: 0018:00000100c3cf3af8  EFLAGS: 00010286
RAX: 1b6012ffffffff00 RBX: 0000000000000000 RCX: ffffffffffffffe8
RDX: 0000000000000000 RSI: ffffffff80421ba0 RDI: 0000010005cfd340
RBP: 00000100e0c97480 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 00000000fffffff4
R13: ffffffff8029eeb0 R14: 0000000000000000 R15: 0000000000000003
FS:  0000002a9588e0a0(0000) GS:ffffffff80514b40(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000002a9576650c CR3: 0000000000101000 CR4: 00000000000006e0
Process ifconfig (pid: 11258, threadinfo 00000100c3cf2000, task
00000100c3eba580)
Stack: 
0000000000000003 
00000100c281f000 
00000100e0c97480 
ffffffff802ab825 
00000100c281f000 
ffffffff8029ef78 
0000000000000000 
00000100c281f000 
0000000000000003 
ffffffff802a86e3 
Call Trace:
<ffffffff802ab825>{noop_enqueue+37}
<ffffffff8029ef78>{dev_queue_xmit+200}
<ffffffff802a86e3>{nf_hook_slow+227}
<ffffffff8029eeb0>{dev_queue_xmit+0}
<ffffffff80310014>{igmp6_send+724}
<ffffffff80304270>{fib6_walk_continue+192}
<ffffffff803043a0>{fib6_clean_node+0}
<ffffffff803107e3>{igmp6_join_group+51}
<ffffffff8030e18f>{igmp6_group_added+191}
<ffffffff802fbae1>{addrconf_prefix_route+225}
<ffffffff8030e415>{mld_del_delrec+117}
<ffffffff8030e726>{ipv6_dev_mc_inc+486}
<ffffffff802fb86b>{addrconf_join_solict+59}
<ffffffff802fd0fc>{addrconf_dad_start+28}
<ffffffff802fc93b>{addrconf_add_linklocal+43}
<ffffffff802fca35>{addrconf_dev_config+229}
<ffffffff802fcc9b>{addrconf_notify+123}
<ffffffff801411ff>{notifier_call_chain+31}
<ffffffff8029ea25>{dev_open+261}
<ffffffff8029fe0f>{dev_change_flags+95}
<ffffffff802d91c4>{devinet_ioctl+756}
<ffffffff802db4c7>{inet_ioctl+87}
<ffffffff80297641>{sock_ioctl+577}
<ffffffff80186ef4>{sys_ioctl+532}
<ffffffff80112055>{error_exit+0}
<ffffffff80111750>{system_call+124}
Code: 
ff 
08 
0f 
94 
c2 
84 
d2 
74 
09 
48 
8b 
01 
48 
89 
c7 
ff 
50 
08 
48 
89 
RIP 
<ffffffff8029a85d>{__kfree_skb+173}
 RSP <00000100c3cf3af8>
 
<0>Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing

Due to a lack of system dumps, I'm hoping that someone might have seen a
similar panic and might offer some things to try to resolve this issue.

Thanks...
-- 
Makia Minich <minich at ornl.gov>
National Center for Computation Science
Oak Ridge National Laboratory

-------------- next part --------------
A non-text attachment was scrubbed...
Name: cray_diffs.patch
Type: application/octet-stream
Size: 9107 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060718/31195274/attachment.obj>


More information about the general mailing list