[openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port

Sean Hefty sean.hefty at intel.com
Tue Oct 10 21:59:17 PDT 2006


>No, I was only speaking about IPoIB part of the patch.
>I didn't look at atomic usage inside the new module yet.

I'm not clear on what you code you're referring to.  No new atomics or locks
were added to ipoib.  I just tried to rely solely on the flags to determine
state.

>You might be right. But I wander whether we'll regret it later that
>we switched to the slower generic thing when we already had a stable,
>streamlined version.

To be direct, I'm not sure that I'd call the ipoib multicast either streamlined
or stable.  A fix against it just recently went by, and given the complexity of
its use of bit flags, thread, mutex, locks, and pointers to track state, the
code is fairly difficult to follow, so I'm not surprised that it's taken a while
to stabilize.

That doesn't mean that these patches are bug free, but filtering patches simply
on the basis of whether or not they change ipoib doesn't seem like the right
approach to take.  By using the ib_multicast, we eliminate about 50 lines of
code from ipoib.

>Speaking of which - there seems to be some liner scans of outstanding
>requests which could be a scalability problem if there's a
>large number of these, isn't that right?

I'm really not expecting a large number of outstanding requests, but can you
reference the lines in the patch where there's a linear scan?

>I don't follow. ib_free_multicast seems to be called only when we
>leave all mcast groups. So it seems callbacks can now run
>after we do stop_thread which I think will lead to crashes.

The multicast groups are freed during dev_flush(), which does occur after
stop_thread has done its cleanup.  The callbacks use state checks (against the
flags) to avoid queuing work when it shouldn't.  Is there a more specific
problem that you see?

>But I still don't understand - if everyone must use the refcounted API,
>why do we need a separate module and an exported API for something
>that is just an implementation detail?

The existing API does not distinguish which leave request matches which join
request.  This matters to the user because the context for joins will be
different.  The most sensible way to handle this is to give the user a handle
back for their join request, which is what this API does.

The ib_sa is basically wrappers around send_mad with an address handle to the
SA.  It performs a single purpose, and I think it makes more sense to keep it to
its original intent, since somewhere, something actually has to send and receive
MADs from the SA.  We could just as easily argue that the multicast handling
should be buried under the ib_mad interface, since that's actually exposed to
userspace, but I don't think that that's the right approach either.  When a join
request completes, the user is not guaranteed to get a MAD, unless one is
artificially generated.

>Multiple distinct mcmember_rec queries could get us the same mcast group.
>Say MTU > 256 and MTU > 512 look different but actually will get
>same group in practice.
>
>Say 2 clients ask for these 2 queries.
>What will be in the ib_sa_mcmember_rec in this case?

The module currently does not handle queries in as complex a way as the SA.  The
current matching is limited to equality comparisons against the local mcmember
record.  (See cmp_rec() in multicast.c.)  If there's a need to expand the
comparisons, that can be done.

- Sean




More information about the general mailing list