[openib-general][patch review] srp: fmr implementation,
Roland Dreier
rdreier at cisco.com
Fri May 5 18:33:09 PDT 2006
Vu> It still does not address the issue pointed out from my
Vu> previous email - the first eh_host_reset_handler() success,
Vu> right away scsi_eh_host_reset() send start-stop-unit or
Vu> test-unit-ready command using the same scsi command. This stu
Vu> or tur command stuck in our queue, get timeout and get
Vu> aborted. The abortion of stu or tur command once again get
Vu> timeout. The original scsi command get freed. We delay the
Vu> clean-up of the associated request in
Vu> eh_device_reset_handler() instead of in eh_abort_handler() so
Vu> it's still in our queue. The lun is marked offline. The next
Vu> eh_device_reset_handler() for the same lun won't be
Vu> called. The next eh_reset_host_handler() will hit
Vu> used-after-free bug. You can see the log below
I'm still confused. Even the original eh_reset_host_handler
implementation will throw away all commands in the SRP queue, because
it does:
for (i = 0; i < SRP_SQ_SIZE - 1; ++i)
target->req_ring[i].next = i + 1;
target->req_ring[SRP_SQ_SIZE - 1].next = -1;
INIT_LIST_HEAD(&target->req_queue);
and the new patched version does
list_for_each_entry(req, &target->req_queue, list) {
req->scmnd->result = DID_RESET << 16;
req->scmnd->scsi_done(req->scmnd);
srp_unmap_data(req->scmnd, target, req);
}
on top of that.
So after srp_reconnect_target() returns, SRP has no requests in its
queue. The only way that a command could be put in the queue is if
the SCSI midlayer passes it back into the queuecommand functions.
I know I'm being dense but could you explain it one more time?
Also, this really worries me:
Vu> May 5 16:36:24 lab105 kernel: ib_mthca 0000:05:00.0: CQ overrun on CQN 040082
Do you know what's causing this?
- R.
More information about the general
mailing list