[openfabrics-ewg] Current OFED kernel snapshot - problems in backporting SRP to RH4
Ishai Rabinovitz
ishai at mellanox.co.il
Thu May 4 08:39:18 PDT 2006
On Wed, May 03, 2006 at 12:07:19AM +0300, Doug Ledford wrote:
> On Tue, May 02, 2006 at 11:30:13AM +0300, Ishai Rabinovitz wrote:
> > We have a problem when trying to back port SRP to RH4 U2 and U3 (Actually to
> > any kernel earlier than 2.6.13).
> > The problem is when the SCSI driver is calling to eh_abort_handler,
> > or to eh_device_reset_handler.
> > In the current kernel (starting from 2.6.13) this call is made without
> > host_lock spin-lock locked.
> > In the SRP code that performs the abort and the reset (srp_send_tsk_mgmt) we
> > send a message to the target and we wait for a response from the target.
> >
> > In early versions of the kernel the SCSI driver performs irq_spinlock_save to
> > the host_lock before calling to the abort or reset handlers.
> > This creates a problem: The SRP driver can not go to sleep until the target will
> > answer.
>
>
> static int srp_abort(struct scsi_cmnd *scmnd)
> {
> int rv;
> printk(KERN_ERR "SRP abort called\n");
>
> spin_unlock_irq(scmnd->device->host->host_lock);
> rv = srp_send_tsk_mgmt(scmnd, SRP_TSK_ABORT_TASK);
> spin_lock_irq(scmnd->device->host->host_lock);
> return rv;
> }
>
> repeat the same change for srp_reset and srp_reset_host
>
> --
> Doug Ledford <dledford at redhat.com>
> Red Hat, Inc.
> 1801 Varsity Dr.
> Raleigh, NC 27606
>
> _______________________________________________
> openfabrics-ewg mailing list
> openfabrics-ewg at openib.org
> http://openib.org/mailman/listinfo/openfabrics-ewg
Hi Doug,
Thank you for your suggestion. I tried this but it did not work.
Possible reasons why it did not work:
1) Maybe 2.6.9 with this patch only reveals a bug that exists also in the current kernel.
2) There is a chance that the SCSI driver is holding host_lock to avoid a race with
another operation it does (for example freeing memory of this target).
3) Since the SCSI driver uses the spin_lock_irqsave version, there may be a
call to srp_abort in which interrupts are not allowed. The unlock in the patch allows
interrupts and this may cause the problems.
Any suggestion?
--
Ishai Rabinovitz
More information about the ewg
mailing list