[openib-general] Re: ib_mad.c comments

Fri Sep 10 14:43:36 PDT 2004

Hi Sean,

Thanks for reviewing thi early version which is incomplete.

On Wed, 2004-09-08 at 19:47, Sean Hefty wrote:
> Here's a list of comments from reviewing ib_mad.c.  I'll use this list as kind of my to do list for the GSI.  Several of these can be delayed when implementing.  After we meet tomorrow, I will begin creating patches.  Overall, it's a good start.
> 
> ib_mad_reg():
> 	- Need to lock when checking/setting version/class/methods.
This item is on my TODO list for this (ib_mad.c).

> 	- Need to support registrations for "all" methods of a given class.
> 	  (We may want the initial implementation to only do this for now,
> 	  to shorten the development time.)
This can be done today depending on the definition of "all". All is the
entire method bit mask. If "all" means all the ones valid for a class, 
that is a specific bit mask per class. Is this proposing a shortcut way
to do this ? If so, is this much of a savings for the client ?

Do you think it adds that much more complexity to have multiple clients
for methods of the same class ? Wouldn't this be a problem to run SM and
SMA concurrently if just all methods for a class was supported initially
?

> 	- Should we reference qp0 and qp1 with the registration?
Either way (struct ib_qp * (for qp0 or qp1) or enum ib_qp_type) is fine
with me. It seems to me that the enum approach is easier for the client.
Does the client need a pointer to the QP for something else ?

> 	- Need to ensure unique tids in case of wrapping.
Are you referring to making sure that the high TID is not in use ? It is
on my TODO list to change this to use client_id as an index (to save
walking a linked list and just index into a table based on the
client_id. This will be done after all the more straightforward changes.

> ib_mad_post_send():
> 	- We should return the error code from ib_post_send in order to
> 	  handle overruns differently.
Fixed. Posted patch for this.

> 	- The print level should be lowered from error.
There are 2 prints. Not sure which ones should be lowered. Both or only
the one after ib_post_send ?

> 	- Should we avoid casting the list_head to a structure where possible?
Will address as part of response to Roland's comments on ib_mad.c

> allocate_method_table():
> 	- Can just use memset to clear the table.
I changed this to use memset. Posted patch for this.

> check_class_table():
> 	- Has an extra '{'.
Where's the extra { ?

> ib_mad_recv_done_handler():
> ib_mad_send_done_handler():
> 	- Not sure why these calls search for the corresponding work request.
On the send side, it should just be the head entry. I will change that.
Right now on the receive side, there is one receive list, and the
receives are posted for QPs 0 and 1 (and ultimately other QPs) so the
right post needs to be found. I think there is additional information in
the WR which is needed for the callbacks. I didn't get far enough on the
receive side so I will defer this part of the answer for now.

> ib_mad_post_receive_mads():
> 	- I think we can just pass &qp0 or &qp1, rather than a type to
> 	  ib_mad_post_receive_mad.
Right now, the QP type is saved as part of the private MAD. This may be
unnecessary. Won't be sure until I complete the receive side.

> 	- Print level should be lowered from error
There are 2 prints. Not sure which ones should be lowered. Both or only
the one after ib_post_recv ?

> 	- We can track the number of posted receives to avoid posting overruns.
This seems like an optimization. I will put it on my TODO list for now.

> struct ib_mad_device_private:
> 	- If we make qp0 and qp1 an array, it may simply the code and remove
> 	  several checks from the code.
Good point. It does (simplify the code :-) Already posted patch for
this.

> ib_mad_device_open():
> 	- A nit, but it's logically initializing a port on the device.
Fixed. Posted patch for this.

> 	- Remove +20 to CQ size.
Fixed. Posted patch for this.

> 	- We could change to using 1 PD/device, versus 1 PD/port.
Is there any advantage/disadvantage one way or the other ?

> 	- Not sure if we need to support max_sge on the send queue.  This may
> 	  be substantially larger than what we need.  At a minimum, I think
> 	  that we need 2 for optimal RMPP support.  I'm not sure where the
> 	  trade-off is between SGE versus copying into a single buffer lies.
I'm not following where the minimum of 2 for optimal RMPP support comes
from. 

-- Hal