[ewg] [GIT PULL ofed-1.5.2] iw_cxgb3 critical bug fixes

Steve Wise swise at opengridcomputing.com
Sat Aug 28 07:14:40 PDT 2010


  Hey Vlad,

Please pull these two critical bug fixes for iw_cxgb3 from:

ssh://vlad@sofa.openfabrics.org/~swise/scm/ofed_kernel.git ofed_1_5

The are needed in ofed-1.5.2 for large cluster stability.

Thanks

Steve.

--------

commit d3a225ba07fdd7bf4eccaab383818dde41231166
Author: Steve Wise <swise at opengridcomputing.com>
Date:   Sat Aug 28 08:07:38 2010 -0500

         RDMA/cxgb3: Remove BUG_ON() on CQ rearm failure

         Failure to rearm a CQ means the cxgb3 device is wedged, but we 
shouldn't
         kill the whole system with a BUG_ON() if this happens.

         Signed-off-by: Steve Wise <swise at opengridcomputing.com>
         Signed-off-by: Roland Dreier <rolandd at cisco.com>

commit f3495ab83c7906ada0fea19867c16c634014e160
Author: Steve Wise <swise at opengridcomputing.com>
Date:   Sat Aug 28 08:02:42 2010 -0500

     RDMA/cxgb3: Don't exceed the max HW CQ depth.

     From: Steve Wise <swise at opengridcomputing.com>

     The max depth supported by T3 is 64K entries.

     This bug will cause stalls and possibly crashes in large MPI clusters.

     Signed-off-by: Steve Wise <swise at opengridcomputing.com>




More information about the ewg mailing list