[openib-general][PATCH 1 of 3] repost: Client Reregister support for kernel space

Eitan Zahavi eitan at mellanox.co.il
Wed May 31 10:45:48 PDT 2006


Hi Sean,

Leonid just sent an example for a race that might happen if the SM is to
be the maintainer of the data.

> Eitan Zahavi wrote:
> > Well, this is a very old topic well discussed years ago. All credits
to
> > Ashok Raj which you know better then me. The argument against the
idea
> > for the SM to be the keeper of these registrations goes as follows:
> 
> Yes - and I still don't understand why this isn't a personal problem
of the SM.
[EZ] Anyway, I hoped for some modesty. 
> 
> > Without the use of client re-registration the only way to keep this
data
> > consistent between the master SM and its standby slaves (or across
SM
> > restarts) is by using a transaction safe model.
> 
> There are probable other solutions to this problem.  Couldn't the data
also be
> written to a shared file system?
[EZ] Yes a journaling file system that is never dropping transactions
but still does not impose extra latency is what you are looking for -
and I do not think you will find such ideal solution
> 
> > Transaction safe models are well known and require distributed
handshake
> > or journaling systems (in our case distributed ones). Anyway what
this
> > means is that every transaction (like creation of new Service Record
or
> > registering to a multicast group) will have to be first committed
and
> > approved by all standby SMs before it is responded.
> >
> > This strict transaction safe model is not fitting very well with the
> > requirement for scalability of the fabric - which is hard to make
even
> > without that complication.
> 
> How is this less scalable than every client in the fabric maintaining
this data,
> detecting failures (if this can be done), and reissuing the requests?
[EZ] The SM is a single entity that has to respond to all requests from
the entire cluster. (Even redirection requests). When you require that
SM to also provide transaction safe storage or even worse then that
consistency with multiple standby SMs you worsen the problem. The
clients on the their side only need to maintain their own registrations.

I proposed in the past a SA client will do this job in a transparent
manner for all clients on a machine. ( So I thought we have already gone
through this ritual...)
> 
> Having standby SMs rely on fabric clients in order to build their
database just
> seems like the wrong approach.  If they actually had an up to data
database,
> then those SMs could also respond to queries.
[EZ] Well to me and several other people this seems the right approach. 
> 
> - Sean



More information about the general mailing list