[ofa-general] ***SPAM*** Re: System crashed while booting Linux (ia64) with three Mellanox HCAs (15b3:6274)
Phillip Wilson
phillipwils at gmail.com
Fri Mar 27 16:57:24 PDT 2009
I just notified Mellanox of the issue with the 1.2.490 firmware ...
On Fri, Mar 27, 2009 at 4:48 PM, Roland Dreier <rdreier at cisco.com> wrote:
> > I spent the last couple of days retracing my steps. In my haste, I
> > listed the wrong HCA firmware revision. It was firmware 1.2.940 that
> > caused the system to crash while booting to Linux. I have the mthca
> > driver built into the kernel; it is not a loadable driver. The system
> > boots fine with the 1.2.0 firmware.
>
> Oh, it's mthca firmware version dependent? That's a big clue: you're
> using mem-free firmware, which means the HCA uses system memory to store
> big chunks of internal state. If something is going wrong with how the
> memory is mapped to the HCA (or how the HCA writes to it) then that
> could cause memory corruption -- possibly tied to posting receives to
> the hardware as part of the MAD initialization.
>
> So it could be a driver bug exposed by the new firmware, or a firmware bug.
>
> Is Mellanox following this bug? Maybe they have some idea of how to
> figure out what the HCA is doing that could crash a system.
>
> - R.
>
More information about the general
mailing list