[openib-general][patch review] srp: fmr implementation,

Vu Pham vuhuong at mellanox.com
Wed Apr 19 11:56:34 PDT 2006


Roland Dreier wrote:
>  > > And what if you comment out the line
>  > > 	.eh_device_reset_handler	= srp_reset_device,
>  > > does that fix it?
> 
>  > No
> 
> Now I'm really confused.
> 

Me too.

> It seems we lose the connection to the target (BTW -- do you know why
> the connection is getting killed)?

I reported the error from my original email responding to 
your fmr patch. For ia64 system with pcix hca I got asyn 
event IB_EVENT_QP_ACCESS_ERR at the initiator (and I got cqe 
with IB_COMPLETION_STATUS_REMOTE_ACCESS_ERROR status at my 
target)
I still have not had an IB analyzer trace (as you suggested)

> 
> So the SCSI midlayer times out commands and tries to abort them.  But
> we have no connection so the abort fails.  The SCSI command shouldn't
> get freed now (at least if I'm understanding scsi_error.c correctly).
> 
> Then we have no .eh_device_reset_handler so everything should fall
> through to calling our .eh_host_reset_handler without freeing any SCSI
> commands.  And then we crash on a use-after-free of a SCSI command.
> 
> So where is that command getting freed on us??
> 

The scsi command that is used by error handlers 
(.eh_abort_handler, .eh_host_reset_handler...) is not the 
same as use-after-free scsi command from req->scmnd

There is some glitch that the scsi command from req->scmnd 
already freed by scsi midlayer; however, the request is 
still in our pending request queue

Vu





More information about the general mailing list