[ofa-general] opensm: Unsupported attribute = 0xFF02

Jason Gunthorpe jgunthorpe at obsidianresearch.com
Wed Oct 31 21:05:29 PDT 2007


On Thu, Nov 01, 2007 at 05:56:48AM +0200, Sasha Khapyorsky wrote:

> > At a minimum how hand off is supposed to work is very vaugely
> > specified in the IBA.
> 
> It is at least basically described in the IBA - with exchanging SMInfo.

Well sort of..

Lets take the hardest example I know of, connecting two running
subnets together. There are several phases
1) Discovery - Two fully operational master SMs are running and
   maintaining their non-overlapping subset of nodes.
2) Election - Each SM independently decides who should become
   the master, but each SM continues to operate fully within its
   partition.
3) Quiscence - The new master waits for the old masters
   to stop operating on their partitions (this is what HANDOVER
   could signal)
4) Master assertion - The new master assumes control of the nodes
5) Standby - The old master drops to standby (this is what HANDOVER
   ACK could signal)

The spec isn't really clear about how the two HANDOVER sminfos map to
the above process. My personal view on this was that HANDOVER was sent
old master -> new master when the old SM is quiet and HANDOVER ACK is
what signals the old SM to go to standby. 

Ie the master sends HANDOVER ACK once all partitions it is assuming
control of have sent HANDOVER and after it has completely progrgrammed
the nodes.

A similar but ultimately simpler process happens when promoting a
standby sm to master..

IIRC there are other valid views on how this process goes, and I have
no idea what opensm does, or if it would be compatible with this view :)

> > Besides, even if hand off wasn't a problem the two SMs would have to
> > have very similar ideas on routing, multicast, QOS, services, etc
> 
> In worst case the routing tables and QoS setups could be reconfigured
> from scratch (just as if it could be first SM run), and all SA related
> things could be rerequested with ClientReregistration bit.

Well, I think you have to ask what the point of this is - what you are
describing is not high availability, you are just talking about a
dis-orderly restart of the entire fabric. I guess, why would you ever
run a master/standby SM configuration if not for HA? You can't get HA
by mixing vendors today..

Jason



More information about the general mailing list