[openib-general] First Multicast Leave disconnects all other clients

Eitan Zahavi eitan at mellanox.co.il
Wed Nov 30 07:03:03 PST 2005


Sorry for the late response.

The bottom line:
We are missing 3 agents in the OpenIB stack:
InformInfo - handling registrations and Report dispatching
ServiceRecord - tracks registrations
Multicast Join/Leave - tracking registrations to multicast groups and
ref-counting 

All these agents should be able to cleanup dead client registrations and
also provide re-registration in case of SM ClientReregistration event.

Please see below
> >
> > It seems the IBTA intent was that the IB driver will be responsible
for maintaining
> the list of clients
> > registered to each group.
> 
> Yes, the end node is responsible for tracking the registrations within
> the node and fabricating responses when the node does not want to
leave.
> Is delete a different case though ?
[EZ] No it is not. Delete of multicast group is really the last leave.
> 
> > But the IB core does not track what clients registered (through SA
requests) to a
> particular multicast group.
> > The first client to leave the group causes the rest (of the clients)
to be disconnected.
> 
> This is an implementation issue IMO and applies to other subscriptions
> too (not just limited to multicast).
[EZ] I agree it is an implementation issue. I hope it will get
implemented in OpenIB.
> 
> > My proposal is to provide an API for such registrations at both user
and kernel and
> track the requesting processes.
> > Cleanup is also required both by process and kernel module
granularity.
> 
> Is the API the SA client request itself for this ? Shouldn't the
> tracking be done there (within sa_query.c) ?
[EZ] It will be hard to sniff the MADs (especially user level) for all
the registration flows.
So I propose we should have
ib_join/ib_leave/ib_reg_svc/ib_unreg_svc/ib_reg_inform/ib_unreg_inform.
Both in user land and in kernel.
> 
> > BTW: The same API could also handle "Client Reregistration" for
multicast groups,
> 
> Client reregistration is for all subscriptions (including
ServiceRecords
> and events as well).
[EZ] Yes exactly. I believe similar problem exists for all
registrations.
> 
> > such that we could avoid the need to have that code duplicated by
every client.
> 
> I'm missing how client reregistration would help here. Can you
elaborate
> ?
[EZ] It is related to the reference tracking: 
If a kernel module tracks all registrations to refcount them and perform
cleanup, it could with similar effort also send the - re-registration in
the event of SM change ...

> 
> > But this refers to yet another API that is missing: Report
dispatching which deserves
> its own
> > mail...
> 
> I'm missing the connection between reregistration and report
> dispatching.
[EZ] Sorry for not being verbose. The need for Events dispatcher is
based on the fact that only one client should respond to Report with
ReportRepress. Reports are "unsolicited" MADs coming into the device. In
umad the implementation prevents any "multiple" client registration for
receiving any "unsolicited" MAD - only one class-agent needs to be there
handling "unsolicited" messages. This is fine - but what it means is
that when two clients wants to be notified about events they should
register with that agent and the agent should be able to dispatch the
message to all registered clients as well as send only one response
back.





More information about the general mailing list