[openib-general] ib_mad.c comments

Wed Sep 8 20:07:23 PDT 2004

Huh... I didn't even notice it was checked in... anyway, my comments
follow after some comments on Sean's comments:

    Sean> Need to lock when checking/setting version/class/methods.

I agree for the initial implementation.  Ultimately RCU seems better
but I would recommend sticking with locking to start with since it's
much easier to code correctly.

    Sean> We should return the error code from ib_post_send in order
    Sean> to handle overruns differently.

What did we decide about how to handle someone posting more sends than
the underlying work queue can hold?  In any case I agree with this.

    Sean> Should we avoid casting the list_head to a structure where
    Sean> possible?

Yes, definitely.  It's much better to do something like
&mystruct->list rather than relying on the fact that mystruct has a
struct list_head as its first member.  In fact the usage of list.h is
pretty broken throughout ib_mad.c, see below.

    Sean> Not sure why these calls search for the corresponding work request.

Yes -- we know the next request to complete will always be the oldest
one we have around, right?

    Sean> Not sure if we need to support max_sge on the send queue.
    Sean> This may be substantially larger than what we need.  At a
    Sean> minimum, I think that we need 2 for optimal RMPP support.
    Sean> I'm not sure where the trade-off is between SGE versus
    Sean> copying into a single buffer lies.

I'm not sure there's much practical difference between copying and
using the HCA to do a gather/scatter on a buffer of size 256.  The big
difference is memory per WQE (at least for mthca): supporting the
max_sge means each WQE will be about 1 KB, while using a smaller
number means each WQE could be about 128 bytes.

OK, my comments (which are based on only a quick read and therefore
focused mostly on low-level coding details):

	kmem_cache_t *ib_mad_cache;

seems to be unused -- should be static anyway.

	static u32 ib_mad_client_id = 0;

needs to be protected by a lock when used later

	#define IB_MAD_DEVICE_LIST_LOCK_VAR	unsigned long ib_mad_device_list_sflags
	#define IB_MAD_DEVICE_LIST_LOCK()	spin_lock_irqsave(&ib_mad_device_list_lock, ib_mad_device_list_sflags)
	#define IB_MAD_DEVICE_LIST_UNLOCK()	spin_unlock_irqrestore(&ib_mad_device_list_lock, ib_mad_device_list_sflags)

Don't use this idiom ... just use the spinlock functions directly.  It
makes locking code harder to read and review, it leads to wasteful
stuff like the below (in ib_mad_reg()):

	IB_MAD_DEVICE_LIST_LOCK_VAR;
	IB_MAD_AGENT_LIST_LOCK_VAR;

and besides, Documentation/CodingStyle says

   "macros that depend on having a local variable with a magic name
    might look like a good thing, but it's confusing as hell when one
    reads the code and it's prone to breakage from seemingly innocent
    changes."

	/*
	 * ib_mad_reg - Register to send/receive MADs.
	 * @device - The device to register with.

Start with /** for kernel doc to pick this up.  Might be better to put
it in a header file so that it's easier to find the documentation (but
it's OK to leave it in a .c).

	struct ib_mad_device_private *entry, *priv = NULL,
				     *head = (struct ib_mad_device_private *) &ib_mad_device_list;

This definition of head is totally broken, since ib_mad_device_list is
declared as:

	static struct list_head ib_mad_device_list;

so trying to use it as a struct ib_mad_device_private is just going
off into random memory.  However there's no reason to even have a
variable named head, since it seems you only use it in:

	list_for_each(entry, head) {

This really should be

	list_for_each_entry(entry, &ib_mad_device_list, list) {

and the definition of struct ib_mad_device_private needs to be fixed from

	struct ib_mad_device_private {
		struct ib_mad_device_private *next;

to

	struct ib_mad_device_private {
		struct list_head list;

(you don't have to use the name list for your struct list_head member;
that's just my habit).

	list_for_each(entry2, head2) {
		if (entry2->agent == mad_agent_priv->agent) {
			list_del((struct list_head *)entry2);
			break;
		}
	}	

This is broken for a couple of reasons: misuse of list_for_each as
just described; also, you can't delete items from a list while walking
through it with list_for_each (use list_for_each_safe instead);
finally, there's no reason to walk a list to find the entry you just
added in the same function -- just call list_del on the entry
directly, since you should still have it around.

Pretty much all of these comments apply to all use of the list.h
macros in the file -- most look wrong.

What context is it allowed to call ib_mad_post_send() from?  We never
discussed this, but since the current implementation allocates work
requests with

	mad_send_wr = kmalloc(sizeof *mad_send_wr, GFP_KERNEL);

right now it can only be called from process context with no locks
held.  This seems like it violates the principle of least surprise,
because ib_post_send() can be called from any context.  Also, the
failure case

	if (!mad_send_wr) {
		printk(KERN_ERR "No memory for ib_mad_send_wr_private\n");
		return -ENOMEM;	
	}

needs to set bad_send_wr.

ib_mad_recv_done_handler() seems to be missing a call to pci_unmap_single().

	static u8 convert_mgmt_class(struct ib_mad_reg_req *mad_reg_req)
	{
		u8 mgmt_class;

		/* Alias IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE to 0 */
		if (mad_reg_req->mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) {
			mgmt_class = 0;
		} else {
			mgmt_class = mad_reg_req->mgmt_class;
		}
		return mgmt_class;
	}

I'd rewrite this as

	static inline u8 convert_mgmt_class(struct ib_mad_reg_req *mad_reg_req)
	{
		return mad_reg_req->mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE ?
			0 : mad_reg_req->mgmt_class;
	}

or just open code it in the two places it's used.

	static int allocate_method_table(struct ib_mad_mgmt_method_table **method)
	{
		/* .. */
			return ENOMEM;

probably should be -ENOMEM;

	static void ib_mad_completion_handler(struct ib_mad_device_private *priv)
	{

		/*
		 * For stack overflow safety reason, WC is static here.
		 * This callback may not be called more than once at the same time.
		 */
		static struct ib_wc wc;

Seems like a bad plan to me -- on an SMP machine with multiple HCAs
(or even multiple ports on a single HCA) it seems like we want to
multithread MAD processing rather than serializing it (In fact Yaron
has made a lot of noise about running on giant SGI NUMA machines with
millions of HCAs, where this looks especially bad).  Also the comment
seems to be wrong -- there seems to be one thread per HCA so multiple
copies of the callback can run at once.

	static int ib_mad_thread(void *param)
	{
		struct ib_mad_device_private *priv = param;
		struct ib_mad_thread_data *thread_data = &priv->thread_data;

		lock_kernel();
		daemonize("ib_mad-%-6s-%-2d", priv->device->name, priv->port);
		unlock_kernel();

Just use kthread_create() to start your thread and handle all this.
Even though the current Topspin stack uses a MAD processing thread per
HCA, I'm not sure it's the best design.  Why do we need to defer work
to process context?

		sema_init(&thread_data->sem, 0);

Seems like a race condition here ... what happens if someone else
tries to use the semaphore before the thread has gotten a chance to
run?  In any case...

		while (1) {
			if (down_interruptible(&thread_data->sem)) {
				printk(KERN_DEBUG "Exiting ib_mad thread\n");
				break;
			}

I don't think it's a good idea to use a semaphore and signals to
control the worker thread.  Better would be a wait queue and something
like wait_event().

	#define IB_MAD_DEVICE_SET_UP(__device__) {\
		IB_MAD_DEVICE_LIST_LOCK_VAR;\
		IB_MAD_DEVICE_LIST_LOCK();\
		(__device__)->up = 1;\
		IB_MAD_DEVICE_LIST_UNLOCK();}

	#define IB_MAD_DEVICE_SET_DOWN(__device__) {\
		IB_MAD_DEVICE_LIST_LOCK_VAR;\
		IB_MAD_DEVICE_LIST_LOCK();\
		(__device__)->up = 0;\
		IB_MAD_DEVICE_LIST_UNLOCK();}

These don't seem to merit being macros.  If you really want they
could be inline functions but I don't see any use of the "up" member
outside of the macros anyway, so maybe you can just kill them.  It
seems hard to think of how to test "up" in a way that's not racy.

	for (i = 0; i < num_ports; i++) {
		ret = ib_mad_device_open(device, i);

This is wrong -- for a CA you need to handle ports 1 ... num_ports,
while a switch just uses port 0.

Thanks,
  Roland