[ofw] RE: bugcheck in mlx4_bus

Sean Hefty sean.hefty at intel.com
Thu Aug 20 15:18:22 PDT 2009


I added tracking to the winverbs driver to increment/decrement a counter when
creating/destroying any of the following: CQ, device, endpoint (CM structure),
PD, MW, MR, AH, QP, and SRQ.  All counters end at 0 after cleaning up when the
file is closed (done in the WDF file cleanup callback).

Some rank from MPI PingPong occasionally crashes while starting up a test.  The
crash occurs running the DAPL rdma_cm provider, but the kernel bug may or may
not be related to the use of the rdma_cm.  The user space code may just crash at
the wrong (or right) time with that provider to trigger this error.  The kernel
crash doesn't occur every time.

Anyone have any other ideas to help isolate?




More information about the ofw mailing list