[ofa-general] [RFC PATCH] IB/srp: respect target credit limit

Tue Dec 18 07:16:58 PST 2007

On Mon, Dec 17, 2007 at 09:58:17PM -0800, Roland Dreier wrote:
>  > The SRP initiator will currently send requests, even if we have no
>  > credits available. The results of sending extra requests is
>  > vendor-specific, but on some devices, overrunning our credits will cost
>  > us 85% of peak performance -- e.g. 100 MB/s vs 720 MB/s. Other devices
>  > may just drop them.
> 
> I guess this happens because a target sometimes completes commands
> with a value of 0 for req_lim delta in the response?

I didn't instrument the delta in the responses, I just noticed that
req_lim_zero was growing very, very quickly when the performance dropped.
I added a sysfs entry to give a view of req_lim, and I saw it get to -3
several times, so I expect we're actually getting some negative deltas back.

That may make some sense, as I was hitting this with 1 MB and 4 MB I/O sizes,
so perhaps the target was using this to throttle the initiator. The spec
mentions a limit on how negative the delta can be, so it seems to imply that
this is OK, but it seems like it complicates the target's management of
buffers -- we could send a request thinking we have 1 credit just before we
process a response that takes 3 away...

I can instrument the deltas tomorrow if you like, when I'm back in the office.

>  > Also, it adds a queue length parameter to the target options, so that
>  > the maximum number of commands the SCSI mid-layer will try to send can
>  > be capped before it hits the credit limit.
> 
> Is there anything sensible someone can do with this new parameter?

If you know the request limit for the SRP target, you can set it here and
(hopefully) avoid invoking the command requeuing. I say hopefully, as the
initial delta is not always the queue depth, and the potential for negative
deltas.

It also has some potential use for balancing, I suppose.

Honestly, I hadn't thought much about it, but it seemed a good compliment
to the per-lun limit.

> It seems that things would be simpler if this test of credits went
> into __srp_get_tx_iu() -- just have it return NULL if no credits are
> available instead of bumping zero_req_lim.  And I guess add a
> parameter to know whether the IU is for a task management command or
> not.
[snip]
> Similarly I don't see anything gained aside from an opportunity for
> incorrect code by moving the decrement of req_lim into its own
> function that every caller of __srp_post_send() has to remember to
> call right after __srp_post_send().

They're a hold-over from when I thought I was going to have to add more
code around them to stop the request queue, but I found a simpler way.

I'll merge them into existing code for the next iteration.

>  > -			target->scsi_host->can_queue = min(target->req_lim,
>  > -							   target->scsi_host->can_queue);
> 
> Why do we want to delete this?  It seems that for most sane targets,
> the initial request limit in the login response is a good
> approximation to what we should limit our queue depth to.

The command line interface for one of our arrays suggests that the limit
is 32 cmds for the host, but the inital deltas received ranged from 29 to 33.
Further deltas would often push it back to 32 or above pretty quickly. Now,
losing the ability to queue a few more commands may not affect performance
all that much, but I'd rather not restrict it if possible.

>  > +	.max_host_blocked		= 1,
> 
> Documentation of what max_host_blocked does is pretty scant.  What
> does this change do?

This controls how long the mid-layer waits until unblocking the request
queue for this target when we return SCSI_MLQUEUE_HOST_BUSY from
srp_queuecommand(). The default is 7. I had thought that this was the
number of commands that needed to complete before it would automatically
unblock the queue and try again, but looking at it some more to give you
a better explaination, it appears that this will actually drain the
queue of all outstanding requests before it starts decrementing this for
for each newly request submitted to the mid-layer.

Ugh, that's a big stall. :( At least it only costs a small amount
compared to overruning the credits.

I was avoiding the use of scsi_block_requests(), as we cannot hold the
host_lock when unblocking with scsi_unblock_requests() since it runs the
queue again. There may be some games to play with can_queue, but I'm not
keen on that. Same goes for just zero-ing out scsi_host->host_blocked when
we have positive credits, as scsi_end_request() will cause a queue run once
we complete a command. That may have some merit, though I'm still not happy
about the internal SCSI knowledge.

Again, for tomorrow.
Dave