[ofa-general] RE: OFED HA related question

Wed May 16 14:43:30 PDT 2007

> 
>     Changqing> 	Suppose I get IBV_EVENT_DEVICE_FATAL 
> async event from
>     Changqing> the first HCA on my node, can I continue to call
>     Changqing> ibv_poll_cq() to get back all the work-requests I
>     Changqing> posted before ?  or do I need to keep track these
>     Changqing> work-requests? I am afraid ibv_poll_cq() will return
>     Changqing> error by itself. Also can I call ibv_dereg_mr() to free
>     Changqing> the memory I registered to this HCA ?
> 
> Once you get a catastrophic error, all bets are off.  Work 
> request processing is in an undetermined state, since 
> basically the HCA crashed in an unknown way.  Polling CQs is 
> probably not a good idea.
> I guess you do need to deregister memory regions to unpin the 
> memory as part of your cleanup....

Thanks. However, when catastrophic error occurs, there are some entries
in CQ,
can I continue to peek them using ibv_poll_cq() ?

Also does ibv_dereg_mr() work when fatal error occurs ?

--CQ

> 
>     Changqing> 	If I continue to use the second HCA, 
> does the failure
>     Changqing> of first HCA affect the operation of second HCA (from
>     Changqing> driver point of view) ?
> 
> No.
> 
>  - R.
>