[openib-general][patch review] srp: fmr implementation,

Vu Pham vuhuong at mellanox.com
Fri May 5 12:00:56 PDT 2006


> 
> reading scsi_error.c again, I find this logic for our case (please 
> correct me if I'm wrong)
> 1. eh_abort_handler and eh_device_reset_handler fail with timeout; 
> eh_host_reset_handler successes
> 2. scsi_eh_host_reset goes on with scsi_eh_try_stu & scsi_eh_tur
> 3. either scsi_eh_try_stu or scsi_eh_tur will reuse the scsi command and 
> call scsi_send_eh_cmnd to send STU or TUR command
> 4. scsi_send_eh_cmnd calls srp_queuecommand which will get new req, 
> reformat scsi_done pointer to scsi_eh_done, and add req to req_queue for 
> this same scsi command with different opcode (ie. STU or TUR)
> 5. In my case I got QP event 1 - so scsi_send_eh_cmnd will get to 
> timeout case and call eh_abort_handler for this scsi command with opcode 
> STU or TUR
> 6. scsi_eh_try_stu & scsi_eh_tur will retrieve the old scsi command back 
> with scsi_set_cmd_retry; however, srp already change and can not 
> retrieve the old scsi_done and host_scribble pointer
> 8. scsi_eh_host_reset fail and scsi_eh_offline_sdevs is called
> 9. scsi_eh_offline_sdevs calls scsi_eh_finish_cmd which moves the scsi 
> command to done_q and scsi command is freed in done_q
> 10. However the srp req carries this scsi command still in our 
> req_queue. The next eh_host_reset_handler will re-init the req_queue and 
> use the scsi command pointer (this is the crash use-after-freed that we 
> see)
> 
> Bottom line my previous patch still does not address the logic above - 
> I'll rework the patch and send to you later for review
> 


on correction: my previous patch address the issue since the 
the abort of TUR or STU command get time out and I remove 
the req; therefore the req was not in req_queue anymore and 
subsequence eh_host_reset_handler did not run into 
use-after-free

Vu



More information about the general mailing list