[openib-general] Re: oops with ipath and mvapich (branch 1.0)

Makia Minich makia at llnl.gov
Thu Mar 16 17:55:23 PST 2006


As an added bonus, mpi tasks will generally abort with:

[0] Abort: [1] Abort: Error posting send
 at line 212 in file viapriv.c
Error posting send
 at line 589 in file viapriv.c

or

[0] Abort: [ldev10:0] Got completion with error code 9
 at line 1189 in file viacheck.c
done.

It would seem that I can get off a couple tasks (about 3 or so) and then
subsequent mpi tasks will hang or not complete.

* Ira Weiny (weiny2 at llnl.gov) wrote:
> I have pulled branch 1.0 and we just got in some ipath hardware.  When we tried
> to run some mvapich tests on it we got the following oops.  ibv_rc_pingpong
> worked.  I am going to see if the trunk works better for us.  Perhaps the 1.0
> branch is just too old?
> 
> Ira Weiny
> weiny2 at llnl.gov
> 
> 2006-03-16 17:39:26 Unable to handle kernel paging request at 00000000002003d4 RIP:
> 2006-03-16 17:39:26 <ffffffff802eb9b5>{_spin_lock_irqsave+12}
> 2006-03-16 17:39:26 PML4 7598b067 PGD 7ff26067 PMD 0
> 2006-03-16 17:39:26 Oops: 0000 [1] SMP
> 2006-03-16 17:39:26 CPU 1
> 2006-03-16 17:39:26 Modules linked in: nfsd(U) exportfs(U) netdump(U) i2c_dev(U) i2c_core(U) dm_mod(U) md(U) ohci_hcd(U) sata_sil(U) k8_edac(U) edac_mc(U) ib_ipoib(U) ib_ping(U) ib_uat(U) ib_at(U) ib_ipath(U) ipath_core(U) ib_umad(U) ib_ucm(U) ib_uverbs(U) ib_sa(U) ib_cm(U) ib_mad(U) ib_core(U) forcedeth(U) floppy(U) sata_nv(U) libata(U) scsi_mod(U) nfs(U) lockd(U) sunrpc(U) unionfs(U) e1000(U)
> 2006-03-16 17:39:26 Pid: 20120, comm: osu-bibw Not tainted 2.6.9-31chaos
> 2006-03-16 17:39:26 RIP: 0010:[<ffffffff802eb9b5>] <ffffffff802eb9b5>{_spin_lock_irqsave+12}
> 2006-03-16 17:39:26 RSP: 0018:000001007ffabe58  EFLAGS: 00010006
> 2006-03-16 17:39:26 RAX: 0000000000000000 RBX: 00000000002003d0 RCX: 000000000000012d
> 2006-03-16 17:39:26 RDX: ffffffff8050db00 RSI: 0000000000000206 RDI: 00000000002003d0
> 2006-03-16 17:39:26 RBP: 00000000002003d0 R08: 0000000000000020 R09: 0000000000000000
> 2006-03-16 17:39:26 R10: 0000000000000012 R11: 0000000000000080 R12: 0000000000000206
> 2006-03-16 17:39:26 R13: 000001011b900000 R14: 0000000000000000 R15: 0000000000000038
> 2006-03-16 17:39:26 FS:  0000002a95e45c40(0000) GS:ffffffff804de900(0000) knlGS:0000000000000000
> 2006-03-16 17:39:26 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> 2006-03-16 17:39:26 CR2: 00000000002003d4 CR3: 00000000def4e000 CR4: 00000000000006e0
> 2006-03-16 17:39:26 Process osu-bibw (pid: 20120, threadinfo 000001011687a000, task 000001011431d7f0)
> 2006-03-16 17:39:26 Stack: 0000000000000206 0000000000000206 0000000000200200 ffffffffa0184f08
> 2006-03-16 17:39:26        000001011fa1e898 000001011fa1e898 000001011fa1e9d0 0000040000000246
> 2006-03-16 17:39:26        0002000100000005 0000000000000202
> 2006-03-16 17:39:26 Call Trace:<IRQ> <ffffffffa0184f08>{:ib_ipath:ipath_ib_timer+680} <ffffffffa015edc0>{:ipath_core:ipath_verbs_timer+0}
> 2006-03-16 17:39:26        <ffffffffa015ee07>{:ipath_core:ipath_verbs_timer+71}
> 2006-03-16 17:39:26        <ffffffff8013bc55>{run_timer_softirq+356} <ffffffff80138310>{__do_softirq+88}
> 2006-03-16 17:39:26        <ffffffff801383b9>{do_softirq+49} <ffffffff8010fad1>{apic_timer_interrupt+133}
> 2006-03-16 17:39:26         <EOI> <ffffffffa01366c7>{:ib_uverbs:ib_uverbs_poll_cq+120}
> 2006-03-16 17:39:26        <ffffffffa01366c4>{:ib_uverbs:ib_uverbs_poll_cq+117}
> 2006-03-16 17:39:26        <ffffffffa01349a9>{:ib_uverbs:ib_uverbs_write+196}
> 2006-03-16 17:39:26        <ffffffff80174eb0>{vfs_write+207} <ffffffff80174f98>{sys_write+69}
> 2006-03-16 17:39:26        <ffffffff8010f1d2>{system_call+126}
> 2006-03-16 17:39:26
> 2006-03-16 17:39:26 Code: 81 7f 04 ad 4e ad de 74 1f 48 8b 74 24 18 48 c7 c7 2c 5a 30
> 2006-03-16 17:39:26 RIP <ffffffff802eb9b5>{_spin_lock_irqsave+12} RSP <000001007ffabe58>
> 2006-03-16 17:39:26 CR2: 00000000002003d4
> 
(((((((((((((((((((((((((((((((((())))))))))))))))))))))))))))))))))
 Makia Minich                      Money is the Devil's toothpaste.
 925.424.5675                              --The Flea (Mucha Lucha)
(((((((((((((((((((((((((((((((((())))))))))))))))))))))))))))))))))



More information about the general mailing list