[ofa-general] Re: OFED HA related question

Wed May 16 13:26:00 PDT 2007

    Changqing> 	Suppose I get IBV_EVENT_DEVICE_FATAL async event from
    Changqing> the first HCA on my node, can I continue to call
    Changqing> ibv_poll_cq() to get back all the work-requests I
    Changqing> posted before ?  or do I need to keep track these
    Changqing> work-requests? I am afraid ibv_poll_cq() will return
    Changqing> error by itself. Also can I call ibv_dereg_mr() to free
    Changqing> the memory I registered to this HCA ?

Once you get a catastrophic error, all bets are off.  Work request
processing is in an undetermined state, since basically the HCA
crashed in an unknown way.  Polling CQs is probably not a good idea.
I guess you do need to deregister memory regions to unpin the memory
as part of your cleanup....

    Changqing> 	If I continue to use the second HCA, does the failure
    Changqing> of first HCA affect the operation of second HCA (from
    Changqing> driver point of view) ?

No.

 - R.