[ofa-general] Re: [RFC][PATCH] last wqe event handler patch

Wed Jun 25 13:07:04 PDT 2008

 > > Can you explain this change a little more?  It seems quite likely that
 > > we would get last WQE reached events for other states, such as
 > > IPOIB_CM_RX_ERROR coming from ipoib_cm_dev_stop(), and I don't see how
 > > things work if we make this change.
 > > 
 > >  - R.
 > 
 > Hello Roland,
 > 
 >         If it's already in ERROR status, it will be processed through 
 > rx_error_list. In the case of ipoib_cm_dev_stop(), it will wait for 5 * HZ 
 > to be drained and then put into reap_list. In the case of IPoIB running 
 > status, I put a 60 * HZ timer for drain in the stale connection release 
 > patch.

But the 5 second timeout in ipoib_cm_dev_stop() is supposed to be an
exception when something gets wedged, just to avoid waiting forever.  We
want to handle the last WQE reached events normally in most cases.

Would a better fix to add locking around the "assume HW is wedged"
code in ipoib_cm_dev_stop() to avoid problems if the 5 second timeout is
too short?

 - R.