[ofa-general] Re: System crashed while booting Linux (ia64) with three Mellanox HCAs (15b3:6274)

Roland Dreier rdreier at cisco.com
Thu Mar 26 08:54:16 PDT 2009


 > System crashes with three Mellanox mezzanine cards (VID=15b3,
 > DID=0x6274) installed when booting Linux (ia64).  I am using Linux
 > 2.6.24, but this issue also occurs with Linux kernel 2.6.29-rc8.

this is a pretty interesting crash.  Do you have the ib_mthca driver
built into your kernel, or is it being loaded as a module?

 > A partial listing from ib_mad_post_receive_mad.S is posted below the "C" code.
 > The exact instruction that cause the system crash was located at
 > 
 > ib_mad_post_*+0x0080           st4              [r2]=r3                      MII
 >                                nop.i            0x0
 >                                nop.i            0x0
 > 
 > It tries to store r3=0x1600 to [r2] @ 0xE0000007E01C7CCC.

Looking at the assembly, it seems the relevant parts are:

ib_mad_post_*+0x0060           ld4              r3=[r11]                     MMI
                               st8              [r2]=r8
                               adds             r2=28,r12
ib_mad_post_*+0x0070           st4              [r9]=r10                     MMI
                               st8              [r45]=r0
                               nop.i            0x0;;
ib_mad_post_*+0x0080           st4              [r2]=r3                      MII

The main points are "adds r2=28,r12" -- ie r2 now points into the
stack -- and "st4 [r2]=r3" -- ie a store onto the stack is crashing.

In the same function, we have "adds r9=56,r12" and "st4 [r9]=r10"
slightly earlier, so the stack isn't totally messed up (apparently).

Not sure how to debug this since the crash as it stands doesn't seem to
make much sense...



More information about the general mailing list