[openib-general] Re: mthca crash on startup
Roland Dreier
roland at topspin.com
Thu Nov 18 11:12:16 PST 2004
> modprobe: page allocation failure. order:6, mode:0x20
> [<d09098cc>] mthca_alloc_sqp+0x6c/0x420 [ib_mthca]
It's not actually a crash. It's just failing to allocate 2048 * 72
bytes of bus-coherent memory (send queue depth time size of a UD
header) while creating a special QP. The system should survive this,
although of course MAD services won't work.
There are a few things that can be done:
- There's no reason mthca needs to allocate all this memory in one
physically contiguous chunk, although it makes the code simpler.
If this issue persists, we can fix the special QP allocation code
(everything else in mthca is pretty good about not requiring
contiguous pages).
- I seem to recall messages recently on lkml that recent kernels have
VM problems that lead to page allocation failures. I think there
are some VM tunables and some patches in -mm that are supposed to help.
- Having "#define IB_MAD_QP_SEND_SIZE 2048" seems a bit excessive to
me. It seems a much shallower send queue should be plenty,
especially for QP0. Reducing this will reduce the amount of
contiguous memory required, which should improve things.
- Roland
More information about the general
mailing list