[openib-general] Re: [CMA] re-connect procedure?

Tue Mar 21 13:16:43 PST 2006

James Lentini wrote:
> How should CMA consumers re-connect after a disconnect?

The CMA states are currently coded so that you need to destroy/create a new 
rdma_cm_id with each connection.  It probably wouldn't take much to fix this; it 
just hasn't been a priority.

>>From my perspective, it seems reasonable to allow the consumer to 
> re-connect without calling rdma_resolve_addr() and 
> rdma_resolve_route().

I guess I really don't understand why the user disconnected in the first place.

> If those functions are always prerequisites for a connect call, the 
> consumer will need to cleanup and reallocate all of its resources 
> every time the connection is lost.
> 
> In some protocols, a disconnect does not represent a catastrophic 
> failure and therefore does not warrant the complete cleanup. For 
> example, NFS clients and servers close idle connections. When the 
> client has additional operations to send, it reconnects to the server.

I'm assuming that the user closed the idle connection to save system resources. 
  If so, then why wouldn't you want destroy the rdma_cm_id?

> @@ -1443,7 +1443,8 @@ int rdma_connect(struct rdma_cm_id *id, 
>  	int ret;
>  
>  	id_priv = container_of(id, struct rdma_id_private, id);
> -	if (!cma_comp_exch(id_priv, CMA_ROUTE_RESOLVED, CMA_CONNECT))
> +	if (!cma_comp(id_priv, CMA_CONNECT) &&
> +	    !cma_comp_exch(id_priv, CMA_ROUTE_RESOLVED, CMA_CONNECT))
>  		return -EINVAL;

I think that a better fix would be to transition to the correct state after 
disconnect occurred.  I.e. back to CMA_ROUTE_RESOLVED.  This kind of gets back 
to what is the correct state flow.  Should a user be able to walk backwards 
through the state diagram, or should the state return to IDLE?

- Sean