[openib-general] CQ error handling in IPoIB

Tue Jan 2 10:27:01 PST 2007

Hi,
> I have a question regarding error handling in IPoIB.
> 
> The spec says...
> 
> When a CQ encounters an error, in order to be able to use the CQ again,
> the consumer should:
> * Destroy all the QPs that are attached to the CQ
> * Destroy the CQ
> * Recreate the CQ through the Create Completion Queue verb

This speaks about CQ errors such as CQ overrun, not QP errors.

> While (at least one part of) the code does...
> 
> static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc)
> {
> 	...
> 	...
> 	...
>         if (wc->status != IB_WC_SUCCESS &&
>             wc->status != IB_WC_WR_FLUSH_ERR)
>                 ipoib_warn(priv, "failed send event "
>                            "(status=%d, wrid=%d vend_err %x)\n",
>                            wc->status, wr_id, wc->vendor_err);
> }

wc status reports QP errors, not CQ errors.
This is what is handled here.

> Since I don't see any error handling and I understand that the only way left to make the driver work again is to restart it with uload/load, my question is: What does the code assume about errors happening on th CQ?
> 
> thanks
> 
> MoniS

IPoIB prevents CQ overrun errors by allocating CQ size > Rx size + TX size.

-- 
MST