[ofa-general] RE: [RFC v2 PATCH 3/5] rdma/cma: add high availability mode attribute to IDs

Or Gerlitz ogerlitz at voltaire.com
Tue May 20 05:10:20 PDT 2008


Sean Hefty wrote:
> and in concept I prefer to:
> * Always report the event and let ULPs ignore it
> * Let someone come up with a fantastically simple way of reporting new events
I am fine with the approach of always report the event and let ULPs 
ignore it. Looking on how the ABI versions are exchanged between the 
rdma_ucm module to librdmacm,  I don't see much alternatives other to 
bumping the ABI version to five. If librdmacm can somehow note against 
what ABI version the app was built, we could bump the ABI version to 
five and require the user to upgrade his librdmacm to be able to run, 
but have --librdmacm-- hide this event from the user in case "his 
version" of the ABI is smaller.

> I spent most of the morning looking at this, and until I know what the
> trade-offs really are in the implementation, I can't say that I have a strong
> preference for how to deal with any of this.  My main concerns are:
>
> * All callbacks from the rdma_cm are serialized
> * We minimize the overhead of reporting events
> * We don't lose events
> * If the user returns a non-zero value from a callback, the rdma_cm_id is
>   destroyed, an no further callbacks are invoked.
Thanks for looking into that. Yes, I think its correct and fair to 
require that all these characteristics would remain also after merging 
the new event.
> The existing rdma_cm callbacks are naturally serialized with each other.
> (Callback for connect after resolve route after resolve address...)  This allows
> using the stack for event structures, but the cost is complex synchronization
> with device removal.  Supporting additional events while meeting the concerns
> listed above will be equally challenging.  So if we can simplify device removal
> handling, then supporting similar types of events should be easier as well.
>
> If we can guarantee that this works, one option is to acquire a mutex before
> invoking a callback on an rdma_cm_id.  I hesitate to hold any locks while in a
> callback, since it restricts what the user can do, but if the mutex is only used
> to synchronize calling the user back, it may work, since the rdma_cm never
> invokes a callback from a downcall.  This should simplify the device removal
> handling, eliminating wait_remove and dev_remove from the rdma_cm_id.
I would like to look into this possibility which as you stated later in 
your post is simpler compared to the alternatives and would also make 
the current code of supporting device removal less complex. So 
can/should that mutex be the existing one defined in cma.c or a new one?

Or




More information about the general mailing list