[ofa-general] Re: System crashed while booting Linux (ia64) with three Mellanox HCAs (15b3:6274)
Roland Dreier
rdreier at cisco.com
Thu Mar 26 08:54:16 PDT 2009
> System crashes with three Mellanox mezzanine cards (VID=15b3,
> DID=0x6274) installed when booting Linux (ia64). I am using Linux
> 2.6.24, but this issue also occurs with Linux kernel 2.6.29-rc8.
this is a pretty interesting crash. Do you have the ib_mthca driver
built into your kernel, or is it being loaded as a module?
> A partial listing from ib_mad_post_receive_mad.S is posted below the "C" code.
> The exact instruction that cause the system crash was located at
>
> ib_mad_post_*+0x0080 st4 [r2]=r3 MII
> nop.i 0x0
> nop.i 0x0
>
> It tries to store r3=0x1600 to [r2] @ 0xE0000007E01C7CCC.
Looking at the assembly, it seems the relevant parts are:
ib_mad_post_*+0x0060 ld4 r3=[r11] MMI
st8 [r2]=r8
adds r2=28,r12
ib_mad_post_*+0x0070 st4 [r9]=r10 MMI
st8 [r45]=r0
nop.i 0x0;;
ib_mad_post_*+0x0080 st4 [r2]=r3 MII
The main points are "adds r2=28,r12" -- ie r2 now points into the
stack -- and "st4 [r2]=r3" -- ie a store onto the stack is crashing.
In the same function, we have "adds r9=56,r12" and "st4 [r9]=r10"
slightly earlier, so the stack isn't totally messed up (apparently).
Not sure how to debug this since the crash as it stands doesn't seem to
make much sense...
More information about the general
mailing list