[ofa-general] [RFC PATCH] IB/srp: respect target credit limit

Mon Dec 17 21:58:17 PST 2007

 > The SRP initiator will currently send requests, even if we have no
 > credits available. The results of sending extra requests is
 > vendor-specific, but on some devices, overrunning our credits will cost
 > us 85% of peak performance -- e.g. 100 MB/s vs 720 MB/s. Other devices
 > may just drop them.

I guess this happens because a target sometimes completes commands
with a value of 0 for req_lim delta in the response?  Anyway, I didn't
realize this was a real problem in practice, but given that it is we
definitely want to fix it.  Your patch looks like the right idea
overall, but a few comments:

 > Also, it adds a queue length parameter to the target options, so that
 > the maximum number of commands the SCSI mid-layer will try to send can
 > be capped before it hits the credit limit.

Is there anything sensible someone can do with this new parameter?

 > +/* Must be called with target->scsi_host->host_lock held to protect
 > + * req_lim. Lock cannot be dropped between here and srp_use_credit().
 > + */
 > +static int srp_check_credit(struct srp_target_port *target,
 > +				enum srp_request_type req_type)
 > +{
 > +	/* Each request requires one credit, and we hold a credit in reserve
 > +	 * so that we can send task management requests.
 > +	 */
 > +	s32 min = (req_type == SRP_REQ_TASK_MGMT) ? 1 : 2;
 > +	return target->req_lim < min;
 > +}

It seems that things would be simpler if this test of credits went
into __srp_get_tx_iu() -- just have it return NULL if no credits are
available instead of bumping zero_req_lim.  And I guess add a
parameter to know whether the IU is for a task management command or
not.

 > +/* Must be called with target->scsi_host->host_lock held to protect
 > + * req_lim. Lock cannot be dropped between the call to srp_check_credit()
 > + * and the call to srp_use_credit()
 > + */
 > +static void srp_use_credit(struct srp_target_port *target)
 > +{
 > +	target->req_lim--;
 > +}

 > @@ -965,10 +987,8 @@ static int __srp_post_send(struct srp_target_port *target,
 > +	if (!ret)
 >  		++target->tx_head;
 > -		--target->req_lim;
 > -	}

Similarly I don't see anything gained aside from an opportunity for
incorrect code by moving the decrement of req_lim into its own
function that every caller of __srp_post_send() has to remember to
call right after __srp_post_send().

 > -			target->scsi_host->can_queue = min(target->req_lim,
 > -							   target->scsi_host->can_queue);

Why do we want to delete this?  It seems that for most sane targets,
the initial request limit in the login response is a good
approximation to what we should limit our queue depth to.

 > +	.max_host_blocked		= 1,

Documentation of what max_host_blocked does is pretty scant.  What
does this change do?

 - R.