[openib-general] [RFC] Notice/InformInfo event reporting

Tue Oct 17 06:43:54 PDT 2006

> From: Sean Hefty [mailto:mshefty at ichips.intel.com]
> Sent: Monday, October 16, 2006 6:57 PM
> To: Rimmer, Todd
> Cc: Matt Leininger; openib
> Subject: Re: [openib-general] [RFC] Notice/InformInfo event reporting
> 
> Rimmer, Todd wrote:
> > In a functioning fabric, events will be rare.  However its when you
> > first boot the fabric, reboot the SM or other similar "start up"
actions
> > that things get real busy.
> 
> Hmm... I need to think more about how to handle the start up scenario.
> 
> > In general I have found that only a few clients will use events such
as:
> > IPoIb to manage multicast subscriptions (join as send only for new
> > groups) and SA caches/replicas to keep their cache/replica
synchronized.
> 
> Can you give more details about how ipoib would use the event service?
Technically, to meet the Link Layer semantics which TCP/IP expects,
IPoIB should join as a senderonly for very IPoIB multicast group which
exists. 

This is because TCP/IP in Linux expects to be able to send to any
multicast group without first informing the link layer.  It expects to
inform the link layer only when it wants to receive from a given
multicast group.  This is a side effect of how most Ethernet NICs work
(multicast filtering is only implemented on receive side of Ethernet
NIC) and how Ethernet LANs work (a single subnet will forward multicast
sends to all nodes via the spanning tree).

Hence IPoIB should subscribe for the multicast GID created notice and
use it to manage its sender only status.  It should also register for
the multicast GID deleted notice and use it to delete its sender only
status. (notice that in IBTA 1.2 15.2.5.17.1 SenderOnly status does not
count toward group create/delete reference counts, hence the group can
be deleted while there are sender only members, hence the interest in
GID out of service).

> 
> SA caches seem like they would register for traps... 64 (GID in), 65
(GID
> out),
> and 128 (switch port change)?  Or is it reasonable to limit it to trap
> 128?  Is
> trap 128 likely to be followed by traps 64 and 65?
[Todd Rimmer] Our SA replica only needed to use 64 and 65.  We found
that switch port change did not provide enough information.  GID in/GID
out tell you the GID which has changed.  This allows the replica to
begin adjusting its replica and making queries about that specific GID.

> 
> > In the silverstorm stack we created an API for a client to subscribe
to
> > a notice.  It allowed the client to specify: trap number, local HCA
port
> > subscription was applicable to (in case multi-port HCAs on different
> > fabrics) and information for a callback to the client (client
context
> > void*, function).  The callback provided the client context void*,
the
> > actual NOTICE from the SA and which HCA port it arrived on.
> 
> This sounds like a simple enough interface.  So, you tracked
references on
> only
> the trap numbers then?
Yes.  It reference counted by trap/notice number and registered with the
SA only on the transition from 0->1 reference count and deregistered
with the SA on the 1->0 reference count transition.  We left any LID
filtering up to the client.  In our uses to date the SA replica was
interested in all LIDs.  IPoIB filters itself based on the MC Gid such
that it ignores non-IPoIB GIDs.
> 
> > The API in the stack dealt with all the issues of remaining
subscribed
> > (SA reregistraton, port disconnected/reconnected, etc) so the client
> > merely subscribed, got notice callbacks and later unsubscribed.  In
this
> > style API any LID based filtering would be done in the client
itself.
> 
> This makes sense.
> 
Glad to be of assistance.

Todd Rimmer