[openib-general][PATCH 1 of 3] repost: Client Reregister support for kernel space

Sean Hefty mshefty at ichips.intel.com
Wed May 31 11:14:50 PDT 2006


Eitan Zahavi wrote:
> Leonid just sent an example for a race that might happen if the SM is to
> be the maintainer of the data.

The race Leonid mentioned is a client sending a request when the SM is down. 
That request will fail, so there's no data for the SM to maintain for that node. 
  That's a retry condition that the client must deal with.

> [EZ] The SM is a single entity that has to respond to all requests from
> the entire cluster. (Even redirection requests). When you require that
> SM to also provide transaction safe storage or even worse then that
> consistency with multiple standby SMs you worsen the problem. The
> clients on the their side only need to maintain their own registrations.

I don't believe that there's any requirement that the SM be a single system. 
But I do believe that the SM should be able to recover from all SM problems 
without interrupting any existing communication that is occurring the fabric. 
SM failover or failure/restart should be as transparent to the clients (i.e the 
non-SM nodes in the fabric) as possible.  (Btw, I also believe that the SM 
should run on top of a real DBMS and support SQL style queries...)

You don't want to push this problem to every application running in the fabric, 
so why even push it to every node in the fabric?

- Sean



More information about the general mailing list