[openib-general] ib_mad.c comments

Sat Sep 11 11:12:17 PDT 2004

On Sat, 2004-09-11 at 11:54, Roland Dreier wrote:
>     Roland> static u32 ib_mad_client_id = 0;
> 
>     Hal> This will be locked. Rather than a linked list, this will
>     Hal> become an indexed table as this will be more efficient when
>     Hal> looking up the client in the receive path.
> 
> Hmm... seems like we wouldn't want a static limit on the table size.
> Maybe we should use <linux/idr.h> to handle allocating the 32-bit IDs.

There will be a static limit to the number of clients. We could either
use this as a direct index (or have an additional level of indirection)
and have the client IDs be from a larger ID space. As I don't see a
benefit to this right now (I don't think clients will be very dynamic),
I will start with the direct mapping approach. It can easily be changed
if this is the wrong decision.

>     Hal> There used to be a problem with not doing this with some old
>     Hal> Linux kernels. I will eliminate the static WC and the
>     Hal> comment. Sent patch for this.
> 
> What was the problem?  

I don't know the exact problem, just what I wrote about the "origin" of
this. The precise origin of this has not been explained to me (and it is
not worth chasing down).

>     Hal> To minimize time spent in non process context. Is this an
>     Hal> overly conservative approach ?
> 
> It's probably OK for now.  Probably the correct answer is to use
> tasklets and run in softirq context though (since process context can
> be starved for an arbitrarily long time).

OK. I will put changing this on the futures list for now.

>     Roland> I don't think it's a good idea to use a semaphore and
>     Roland> signals to control the worker thread.  Better would be a
>     Roland> wait queue and something like wait_event().
> 
>     Hal> Is there a problem with this or is this an efficiency issue ?
> 
> Using signals tends to be a bad idea because your thread can get
> killed when init sends SIGKILL to all processes while shutting down,
> and then if you need to send MADs later, say to unmount IB-attached
> storage, you're in trouble.  The semaphore is not so good because if
> 100 wakeups come before the thread runs, it will wake up 100 times in
> a row (even if it does all the work on the first iteration).  Plus
> it's just not idiomatic in the kernel so it makes the code harder to
> review and maintain.

Chainging this is now on my short term list.

>     Roland> These don't seem to merit being macros.  If you really
>     Roland> want they could be inline functions but I don't see any
>     Roland> use of the "up" member outside of the macros anyway, so
>     Roland> maybe you can just kill them.  It seems hard to think of
>     Roland> how to test "up" in a way that's not racy.
> 
>     Hal> up was to be used to qualify whether to proceed with posting
>     Hal> MADs to send. Why is it racy if the check of it also takes
>     Hal> the lock ?
> 
> Well, you have to be careful of
> 
>     context #1                    context #2
> 
>     lock
>     see that up is set
>     unlock
>                                   lock
>                                   clear up
>                                   unlock
>     do work relying on up
>       being set...oops
> 
> Basically it means that anything that relies on up being set has to
> hold the lock across the whole operation.  This includes posting sends
> I assume, which means that the lock has to be a spinlock (since the
> send posting can happen in interrupt context).  But that means you
> can't do any operations that sleep and rely on up staying consistent.
> So it starts to look messy to me (it does seem doable but there may be
> details I'm missing).

Got it. It will be removed. I think it means there may be some other
cleanup cases to deal with but they might have been there anyway.

Thanks.

-- Hal