[ofa-general] ***SPAM*** Re: System crashed while booting Linux (ia64) with three Mellanox HCAs (15b3:6274)

Phillip Wilson phillipwils at gmail.com
Fri Mar 27 16:57:24 PDT 2009


I just notified Mellanox of the issue with the 1.2.490 firmware ...

On Fri, Mar 27, 2009 at 4:48 PM, Roland Dreier <rdreier at cisco.com> wrote:
>  > I spent the last couple of days retracing my steps.  In my haste, I
>  > listed the wrong HCA firmware revision.  It was  firmware 1.2.940 that
>  > caused the system to crash while booting to Linux.  I have the mthca
>  > driver built into the kernel; it is not a loadable driver.  The system
>  > boots fine with the 1.2.0 firmware.
>
> Oh, it's mthca firmware version dependent?  That's a big clue: you're
> using mem-free firmware, which means the HCA uses system memory to store
> big chunks of internal state.  If something is going wrong with how the
> memory is mapped to the HCA (or how the HCA writes to it) then that
> could cause memory corruption -- possibly tied to posting receives to
> the hardware as part of the MAD initialization.
>
> So it could be a driver bug exposed by the new firmware, or a firmware bug.
>
> Is Mellanox following this bug?  Maybe they have some idea of how to
> figure out what the HCA is doing that could crash a system.
>
>  - R.
>



More information about the general mailing list