[openib-general][patch review] srp: fmr implementation,
Vu Pham
vuhuong at mellanox.com
Fri May 5 12:00:56 PDT 2006
>
> reading scsi_error.c again, I find this logic for our case (please
> correct me if I'm wrong)
> 1. eh_abort_handler and eh_device_reset_handler fail with timeout;
> eh_host_reset_handler successes
> 2. scsi_eh_host_reset goes on with scsi_eh_try_stu & scsi_eh_tur
> 3. either scsi_eh_try_stu or scsi_eh_tur will reuse the scsi command and
> call scsi_send_eh_cmnd to send STU or TUR command
> 4. scsi_send_eh_cmnd calls srp_queuecommand which will get new req,
> reformat scsi_done pointer to scsi_eh_done, and add req to req_queue for
> this same scsi command with different opcode (ie. STU or TUR)
> 5. In my case I got QP event 1 - so scsi_send_eh_cmnd will get to
> timeout case and call eh_abort_handler for this scsi command with opcode
> STU or TUR
> 6. scsi_eh_try_stu & scsi_eh_tur will retrieve the old scsi command back
> with scsi_set_cmd_retry; however, srp already change and can not
> retrieve the old scsi_done and host_scribble pointer
> 8. scsi_eh_host_reset fail and scsi_eh_offline_sdevs is called
> 9. scsi_eh_offline_sdevs calls scsi_eh_finish_cmd which moves the scsi
> command to done_q and scsi command is freed in done_q
> 10. However the srp req carries this scsi command still in our
> req_queue. The next eh_host_reset_handler will re-init the req_queue and
> use the scsi command pointer (this is the crash use-after-freed that we
> see)
>
> Bottom line my previous patch still does not address the logic above -
> I'll rework the patch and send to you later for review
>
on correction: my previous patch address the issue since the
the abort of TUR or STU command get time out and I remove
the req; therefore the req was not in req_queue anymore and
subsequence eh_host_reset_handler did not run into
use-after-free
Vu
More information about the general
mailing list