[openib-general] Stale CM callbacks

Thu Jan 11 09:34:22 PST 2007

> Our code isn't handling stale callbacks. Thanks for clarifying it.

I don't believe that your code needs to do anything special.  Most of this 
should be handled by the ib_cm.

> In the above scenario the node comes back up quickly in the reset case than
> the reboot case. So, i was just wondering if the extra delay in the reboot
> case was causing the problem to not occur. In other words, does the switch
> cache the reset node state and discards it after some fixed amount of time.

The extra delay could result in messages timing out, which may not be the case 
if a fast reset occurs.

> Also, should a remote node with which the reset node had established 
> connections call ib_destroy_cm_id() during its disconnect processing ?
> Currently, our code only destroys the QPs (by calling ib_destroy_cq() and
> ib_destroy_qp()).

This depends on whether you want to reuse the cm_id.  If not, it should be 
destroyed, but there shouldn't be any real harm in keeping it around.

If you are seeing any issues with stale connections, please let me know.  It's 
possible that the cm is not handling things correctly.

- Sean