[openib-general] ib_mad.c comments

Sat Sep 11 08:54:49 PDT 2004

    Roland> What did we decide about how to handle someone posting
    Roland> more sends than the underlying work queue can hold?

    Hal> Last I recall, we deferred this issue. Maybe we should just
    Hal> defer the implementation but decide what should be done.

I don't think we can really defer it.  It's pretty ugly for consumers
to have to worry about overflowing a queue that they aren't the only
ones posting to and that they don't know the length of.  My vote would
be to have another queue of pending MADs that get sent as previous
sends complete.  However dropping (and returning a fake successful
completion) might be OK too.

    Roland> Yes -- we know the next request to complete will always be
    Roland> the oldest one we have around, right?

    Hal> On the send side but that's not the case on the receive side
    Hal> as there are posts for multiple QPs. Maybe there should be a
    Hal> list per QP and then this would be true which eliminates the
    Hal> need to walk the list. The implementation is rapidly heading
    Hal> towards this.

Good -- it seems silly to throw away the information about which queue
a receive was posted to.

    Roland> I'm not sure there's much practical difference between
    Roland> copying and using the HCA to do a gather/scatter on a
    Roland> buffer of size 256.

    Hal> Any idea at what buffer size there is a difference ?

You probably won't see much difference until you get to a buffer size
where copying becomes expensive.  I'm thinking multiple KB.

    Roland> static u32 ib_mad_client_id = 0;

    Hal> This will be locked. Rather than a linked list, this will
    Hal> become an indexed table as this will be more efficient when
    Hal> looking up the client in the receive path.

Hmm... seems like we wouldn't want a static limit on the table size.
Maybe we should use <linux/idr.h> to handle allocating the 32-bit IDs.

    Hal> There used to be a problem with not doing this with some old
    Hal> Linux kernels. I will eliminate the static WC and the
    Hal> comment. Sent patch for this.

What was the problem?  If anything with the 4K stack option on i386
for kernel 2.6 we need to be even more aware of how much stack space
we use.

    Hal> To minimize time spent in non process context. Is this an
    Hal> overly conservative approach ?

It's probably OK for now.  Probably the correct answer is to use
tasklets and run in softirq context though (since process context can
be starved for an arbitrarily long time).

    Roland> I don't think it's a good idea to use a semaphore and
    Roland> signals to control the worker thread.  Better would be a
    Roland> wait queue and something like wait_event().

    Hal> Is there a problem with this or is this an efficiency issue ?

Using signals tends to be a bad idea because your thread can get
killed when init sends SIGKILL to all processes while shutting down,
and then if you need to send MADs later, say to unmount IB-attached
storage, you're in trouble.  The semaphore is not so good because if
100 wakeups come before the thread runs, it will wake up 100 times in
a row (even if it does all the work on the first iteration).  Plus
it's just not idiomatic in the kernel so it makes the code harder to
review and maintain.

    Roland> These don't seem to merit being macros.  If you really
    Roland> want they could be inline functions but I don't see any
    Roland> use of the "up" member outside of the macros anyway, so
    Roland> maybe you can just kill them.  It seems hard to think of
    Roland> how to test "up" in a way that's not racy.

    Hal> up was to be used to qualify whether to proceed with posting
    Hal> MADs to send. Why is it racy if the check of it also takes
    Hal> the lock ?

Well, you have to be careful of

    context #1                    context #2

    lock
    see that up is set
    unlock
                                  lock
                                  clear up
                                  unlock
    do work relying on up
      being set...oops

Basically it means that anything that relies on up being set has to
hold the lock across the whole operation.  This includes posting sends
I assume, which means that the lock has to be a spinlock (since the
send posting can happen in interrupt context).  But that means you
can't do any operations that sleep and rely on up staying consistent.
So it starts to look messy to me (it does seem doable but there may be
details I'm missing).

 - R.