[openib-general][patch review] srp: fmr implementation,

Roland Dreier rdreier at cisco.com
Mon May 8 09:31:17 PDT 2006


    Vu> scsi_eh_try_stu or scsi_eh_try_tur get timeout, scsi midlayer
    Vu> tried to abort stu or tur command as well. Since we delay to
    Vu> clean in srp_reset_device(), srp's request queue is still not
    Vu> empty. This stu or tur command is freed by scsi midlayer. The
    Vu> next srp_host_reset() will try to clean srp's request queue
    Vu> with "old" request referencing to freed scsi command.

This is where I get confused.  We should be flushing out the command
queue in srp_host_reset(), so the loop in scsi_error.c after resetting
the host:

		rtn = scsi_try_host_reset(scmd);
		if (rtn == SUCCESS) {
			list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
				if (!scsi_device_online(scmd->device) ||
				    (!scsi_eh_try_stu(scmd) && !scsi_eh_tur(scmd)) ||
				    !scsi_eh_tur(scmd))
					scsi_eh_finish_cmd(scmd, done_q);
			}

should not find any commands still queued.

Your previous patch can't be the right fix.  I think there are two
things wrong with the changes below to srp_reset_device():

 - You changed srp_abort to remove the request from SRP's queue, but
   then you look it up and use it again in srp_reset_device, which
   seems risky at best.
 - If srp_reset_device() succeeds, you don't flush all matching
   commands, so this will definitely leave some stale commands in
   SRP's queue.

 static int srp_reset_device(struct scsi_cmnd *scmnd)
 {
 	struct srp_target_port *target = host_to_target(scmnd->device->host);
-	struct srp_request *req, *tmp;
+	struct srp_request *req;
+	int ret = SUCCESS;
 
 	printk(KERN_ERR "SRP reset_device called\n");
 
-	if (srp_find_req(target, scmnd, &req))
-		return FAILED;
-	if (srp_send_tsk_mgmt(target, req, SRP_TSK_LUN_RESET))
-		return FAILED;
-	if (req->tsk_status)
-		return FAILED;
-
-	spin_lock_irq(target->scsi_host->host_lock);
-
-	list_for_each_entry_safe(req, tmp, &target->req_queue, list)
-		if (req->scmnd->device == scmnd->device) {
-			req->scmnd->result = DID_RESET << 16;
-			scmnd->scsi_done(scmnd);
-			srp_remove_req(target, req);
-		}
-
-	spin_unlock_irq(target->scsi_host->host_lock);
+	if ((srp_find_req(target, scmnd, &req)) ||
+	    (srp_send_tsk_mgmt(target, req, SRP_TSK_LUN_RESET)) ||
+	    (req->tsk_status))
+		ret = FAILED;
 
-	return SUCCESS;
+	return ret;
 }



More information about the general mailing list