[openib-general] Re: mthca crash on startup

Roland Dreier roland at topspin.com
Thu Nov 18 11:12:16 PST 2004


    > modprobe: page allocation failure. order:6, mode:0x20
    >  [<d09098cc>] mthca_alloc_sqp+0x6c/0x420 [ib_mthca]

It's not actually a crash.  It's just failing to allocate 2048 * 72
bytes of bus-coherent memory (send queue depth time size of a UD
header) while creating a special QP.  The system should survive this,
although of course MAD services won't work.

There are a few things that can be done:

 - There's no reason mthca needs to allocate all this memory in one
   physically contiguous chunk, although it makes the code simpler.
   If this issue persists, we can fix the special QP allocation code
   (everything else in mthca is pretty good about not requiring
   contiguous pages).

 - I seem to recall messages recently on lkml that recent kernels have
   VM problems that lead to page allocation failures.  I think there
   are some VM tunables and some patches in -mm that are supposed to help.

 - Having "#define IB_MAD_QP_SEND_SIZE	2048" seems a bit excessive to
   me.  It seems a much shallower send queue should be plenty,
   especially for QP0.  Reducing this will reduce the amount of
   contiguous memory required, which should improve things.

 - Roland



More information about the general mailing list