[openib-general][patch review] srp: fmr implementation,

Vu Pham vuhuong at mellanox.com
Mon May 8 09:19:32 PDT 2006


Roland Dreier wrote:
>  > 1st scsi_try_host_reset() --> srp_host_reset() -->
>  > srp_reconnect_target() return SUCCESS. Then scsi_eh_try_stu() or
>  > scsi_eh_tur() is called right after
>  > 
>  > scsi_eh_try_stu or scsi_eh_tur --> scsi_send_eh_cmnd() -->
>  > srp_queuecommand()
> 
> But after srp_reconnect_target(), both SRP's and the midlayer's queue
> of pending commands should be completely empty, since I put
> 
> 	list_for_each_entry(req, &target->req_queue, list) {
> 		req->scmnd->result = DID_RESET << 16;
> 		req->scmnd->scsi_done(req->scmnd);
> 		srp_unmap_data(req->scmnd, target, req);
> 	}
> 
> and
> 
> 	INIT_LIST_HEAD(&target->free_reqs);
> 	INIT_LIST_HEAD(&target->req_queue);
> 	for (i = 0; i < SRP_SQ_SIZE; ++i)
> 		list_add_tail(&target->req_ring[i].list, &target->free_reqs);
> 
> in there.  Why doesn't that work to kill all the pending commands?

That works fine and kills all the pending commands; however 
right after srp_host_reset return, scsi error handling 
queue/send the stu or tur scsi command right away in the 
error handling flow of function scsi_eh_host_reset()

Please re-read scsi_eh_host_reset() and 
scsi_try_host_reset() in scsi_error.c. Here is the logic

scsi_eh_host_reset() --> scsi_try_host_reset() --> 
srp_host_reset()  --- all pending command are killed. 
srp_host_reset() returns SUCCESS, scsi_try_host_reset() 
returns SUCCCESS.

static int scsi_eh_host_reset(struct list_head *work_q,
                               struct list_head *done_q)
{
...

        rtn = scsi_try_host_reset(scmd);
        if (rtn == SUCCESS) {
            list_for_each_entry_safe(scmd, next, work_q, 
eh_entry) {
                if (!scsi_device_online(scmd->device) ||
                    (!scsi_eh_try_stu(scmd) && 
!scsi_eh_tur(scmd)) ||
                    !scsi_eh_tur(scmd))

...
}

Since the (rtn == SUCCESS), scsi_eh_host_reset calls 
scsi_eh_try_stu() or scsi_eh_try_tur() which will call 
scsi_send_eh_cmnd() --> srp_queuecommand(). Now srp's 
request queue is not empty anymore.

scsi_eh_try_stu or scsi_eh_try_tur get timeout, scsi 
midlayer tried to abort stu or tur command as well. Since we 
delay to clean in srp_reset_device(), srp's request queue is 
still not empty. This stu or tur command is freed by scsi 
midlayer. The next srp_host_reset() will try to clean srp's 
request queue with "old" request referencing to freed scsi 
command.

If you still have question, I can call you or give me a call 
at (408) 916-0006

Vu



More information about the general mailing list